{"id":6543,"library":"beautifulsoup","title":"Beautiful Soup","description":"Beautiful Soup (version 3.x) is a Python 2 library for parsing HTML and XML documents, including those with malformed markup. It creates a parse tree that can be used to extract data from web pages, making it useful for screen-scraping tasks. Version 3.2.2 is the final release in this series. Development on the 3.x series ended in 2011, and it has been largely superseded by `beautifulsoup4` for Python 3.","status":"maintenance","version":"3.2.2","language":"en","source_language":"en","source_url":"https://code.launchpad.net/beautifulsoup/","tags":["web scraping","html parsing","xml parsing","python2"],"install":[{"cmd":"pip install beautifulsoup","lang":"bash","label":"Install Beautiful Soup 3.x"}],"dependencies":[{"reason":"Recommended for better automatic character encoding detection.","package":"chardet","optional":true},{"reason":"Recommended for additional character encodings for CJK languages.","package":"cjkcodecs","optional":true},{"reason":"Recommended for additional character encodings.","package":"iconv_codec","optional":true}],"imports":[{"note":"This import path is specifically for Beautiful Soup 3.x. For Beautiful Soup 4.x (the current, actively maintained version), the correct import is `from bs4 import BeautifulSoup`.","wrong":"from bs4 import BeautifulSoup","symbol":"BeautifulSoup","correct":"from BeautifulSoup import BeautifulSoup"}],"quickstart":{"code":"from BeautifulSoup import BeautifulSoup\n\nhtml_doc = \"\"\"\n<html><head><title>The Dormouse's story</title></head>\n<body>\n<p class=\"title\"><b>The Dormouse's story</b></p>\n\n<p class=\"story\">Once upon a time there were three little sisters; and their names were\n<a href=\"http://example.com/elsie\" class=\"sister\" id=\"link1\">Elsie</a>,\n<a href=\"http://example.com/lacie\" class=\"sister\" id=\"link2\">Lacie</a> and\n<a href=\"http://example.com/tillie\" class=\"sister\" id=\"link3\">Tillie</a>;\nand they lived at the bottom of a well.</p>\n\n<p class=\"story\">...</p>\n</body></html>\n\"\"\"\n\n# For Beautiful Soup 3.x, you pass the HTML string directly.\n# It defaults to Python's SGMLParser.\nsoup = BeautifulSoup(html_doc)\n\nprint(\"Document Title:\", soup.title.string)\nprint(\"First paragraph's class attribute:\", soup.p['class'])\nprint(\"First anchor tag (link):\", soup.a)\nprint(\"Text of the first link:\", soup.a.string)","lang":"python","description":"This quickstart demonstrates basic HTML parsing and element extraction using Beautiful Soup 3.x. It creates a `BeautifulSoup` object from an HTML string and then accesses elements by tag name and attributes. Note that Beautiful Soup 3.x does not take an explicit parser argument like `html.parser` which is common in Beautiful Soup 4.x."},"warnings":[{"fix":"Migrate to `beautifulsoup4`. Install with `pip install beautifulsoup4` and update imports from `BeautifulSoup` to `bs4`. Be aware of API changes.","message":"Beautiful Soup 3.x is no longer actively developed or maintained. The current, actively developed version is Beautiful Soup 4.x (package name `beautifulsoup4`). New projects should use `beautifulsoup4` which offers improved parsing, better Python 3 compatibility, and is actively supported.","severity":"deprecated","affected_versions":"3.x"},{"fix":"Use Python 2.x for Beautiful Soup 3.x, or migrate your codebase to Beautiful Soup 4.x if using Python 3.","message":"Beautiful Soup 3.x is primarily a Python 2 library and is largely incompatible with Python 3. It relies on `SGMLParser`, which was deprecated and removed in Python 3.0. Running BS3 code directly in Python 3 will result in `ImportError: No module named HTMLParser` or `SyntaxError: Invalid syntax`.","severity":"breaking","affected_versions":"3.x"},{"fix":"When migrating from BS3 to BS4, consult the porting guide and update method/attribute names to their BS4 equivalents (e.g., `findAll` to `find_all`, `contents` to `children`).","message":"Between Beautiful Soup 3.x and 4.x, some attribute names were renamed for PEP 8 compliance (e.g., `contents` and `findAll` were common in BS3, now `children` and `find_all` in BS4). This causes `AttributeError` if old names are used with BS4.","severity":"breaking","affected_versions":"3.x to 4.x migration"},{"fix":"Always inspect the HTML you are scraping. If encountering issues, try to use a more robust parser (e.g., `lxml` or `html5lib` with BS4) and compare the parse tree using `prettify()` or `diagnose()` (BS4 only).","message":"Different parsers (like Python's built-in `SGMLParser` in BS3, or `html.parser`, `lxml`, `html5lib` in BS4) can produce different parse trees for malformed HTML. This might lead to unexpected results or missing elements if the parser interprets the markup differently than expected.","severity":"gotcha","affected_versions":"All"},{"fix":"Always check if the result of `find()` is not `None` before attempting to access its attributes or children, e.g., `tag = soup.find('mytag'); if tag: print(tag.text)`.","message":"Accessing an attribute on a `NoneType` object (e.g., `soup.find('nonexistent_tag').text`) will raise an `AttributeError`. This typically happens when a `find()` call doesn't locate any matching tag and returns `None`.","severity":"gotcha","affected_versions":"All"},{"fix":"Use the `tag.get('attribute_name')` method instead, which returns `None` if the attribute does not exist, preventing a `KeyError`. Example: `href = tag.get('href')`.","message":"Using dictionary-style attribute access (e.g., `tag['href']`) will raise a `KeyError` if the specified attribute does not exist on the tag.","severity":"gotcha","affected_versions":"All"},{"fix":"Be mindful of the return type of `prettify()`. If expecting Unicode in Python 2 or dealing with bytestrings in Python 3, explicit encoding/decoding might be necessary during migration.","message":"The `prettify()` method in Beautiful Soup 3.x returns a bytestring, while in Beautiful Soup 4.x it returns a Unicode string. This can cause encoding issues if not handled carefully during migration or when mixing code.","severity":"gotcha","affected_versions":"3.x to 4.x migration"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[]}