Original author(s) | Leonard Richardson |
---|---|
Initial release | 2004 |
Stable release | 4.12.2[1]
/ 7 April 2023 |
Repository | |
Written in | Python |
Platform | Python |
Type | HTML parser library, Web scraping |
License | Python Software Foundation License (Beautiful Soup 3 - an older version) MIT License (versions 4 and up)[2] |
Website | www |
Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML,[3] which is useful for web scraping.[2][4]
Beautiful Soup was started by Leonard Richardson, who continues to contribute to the project,[5] and is additionally supported by Tidelift, a paid subscription to open-source maintenance.[6]
Code example
Beautiful Soup represents parsed data as a tree which can be searched and iterated over with ordinary Python loops.[7] The example below uses the Python standard library's urllib[8] to load Wikipedia's main page, then uses Beautiful Soup to parse the document and search for all links within.
#!/usr/bin/env python3
# Anchor extraction from HTML document
from bs4 import BeautifulSoup
from urllib.request import urlopen
with urlopen('https://en.wikipedia.org/wiki/Main_Page') as response:
soup = BeautifulSoup(response, 'html.parser')
for anchor in soup.find_all('a'):
print(anchor.get('href', '/'))
Release
Beautiful Soup 3 was the official release line of Beautiful Soup from May 2006 to March 2012. The current release is Beautiful Soup 4.x. Beautiful Soup 4 can be installed with pip install beautifulsoup4
.
In 2021, Python 2.7 support was retired and the release 4.9.3 was the last to support Python 2.7.[9]
See also
References
- ↑ Error: Unable to display the reference properly. See the documentation for details.
- 1 2 "Beautiful Soup website". Retrieved 18 April 2012.
Beautiful Soup is licensed under the same terms as Python itself
- ↑ Hajba, Gábor László (2018), Hajba, Gábor László (ed.), "Using Beautiful Soup", Website Scraping with Python: Using BeautifulSoup and Scrapy, Apress, pp. 41–96, doi:10.1007/978-1-4842-3925-4_3, ISBN 978-1-4842-3925-4
- ↑ Python, Real. "Beautiful Soup: Build a Web Scraper With Python – Real Python". realpython.com. Retrieved 2023-06-01.
- ↑ "Code : Leonard Richardson". Launchpad. Retrieved 2020-09-19.
- ↑ Tidelift. "beautifulsoup4 | pypi via the Tidelift Subscription". tidelift.com. Retrieved 2020-09-19.
- ↑ "How To Scrape Web Pages with Beautiful Soup and Python 3 | DigitalOcean". www.digitalocean.com. Retrieved 2023-06-01.
- ↑ Python, Real. "Python's urllib.request for HTTP Requests – Real Python". realpython.com. Retrieved 2023-06-01.
- ↑ Richardson, Leonard (7 Sep 2021). "Beautiful Soup 4.10.0". beautifulsoup. Google Groups. Retrieved 27 September 2022.