Beautiful Soup 4
raw JSON → 4.14.3 verified Tue May 12 auth: no python install: verified quickstart: stale
Beautiful Soup 4 (often imported as `bs4`) is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree, commonly saving programmers hours or days of work in web scraping and data extraction. The library is actively maintained with an irregular release cadence, focusing on Python 3 development. The `bs4` package on PyPI is a dummy package, and the actual library to install is `beautifulsoup4`.
pip install beautifulsoup4 Common errors
error ModuleNotFoundError: No module named 'bs4' ↓
cause The beautifulsoup4 library is not installed in the Python environment being used, or there's a mismatch between the installed package name and the import statement.
fix
Install the library using pip:
pip install beautifulsoup4 (note the 'beautifulsoup4' package name, not 'bs4'). If already installed, ensure your IDE or script is using the correct Python interpreter where it's installed. error AttributeError: 'NoneType' object has no attribute 'get_text' (or 'find', 'contents', etc.) ↓
cause This error occurs when `find()` or `select_one()` methods in Beautiful Soup do not find any matching element and return `None`. Subsequent attempts to call a method (like `get_text()`) on this `None` object lead to the `AttributeError`.
fix
Always check if the result of
find() or select_one() is not None before attempting to access its attributes or methods. For example: element = soup.find('div', class_='my-class'); if element: print(element.get_text()). error TypeError: 'module' object is not callable ↓
cause This typically happens when you import the `bs4` module itself and then try to call `bs4()` as if it were the `BeautifulSoup` class, instead of importing the `BeautifulSoup` class specifically from `bs4`.
fix
Ensure you are importing the
BeautifulSoup class directly: from bs4 import BeautifulSoup. Then, create your soup object as soup = BeautifulSoup(markup, 'html.parser'). error KeyError: 'href' (or 'class', etc.) ↓
cause This error occurs when you try to access an attribute using dictionary-style lookup (`tag['attribute']`) on a tag that does not possess that specific attribute.
fix
Use the
.get() method to safely access attributes, which returns None if the attribute does not exist, preventing a KeyError. For example: link = tag.get('href'). error TypeError: Incoming markup is of an invalid type: <Response [200]> (or expected string or buffer) ↓
cause You are passing a `requests.Response` object directly to the `BeautifulSoup` constructor instead of its text or content.
fix
Extract the HTML content from the
requests.Response object using .text (for string content) or .content (for bytes content) before passing it to BeautifulSoup. For example: soup = BeautifulSoup(response.text, 'html.parser'). Warnings
breaking Beautiful Soup 4 discontinued official support for Python 2 on December 31, 2020. The last version to support Python 2 was 4.9.3. New development targets Python 3.7+ (current versions require Python >=3.7.0). Running BS4 code on Python 2, or Python 2 BS3 code on Python 3, will result in `ImportError` or unexpected behavior. ↓
fix Ensure your project uses Python 3.7 or newer. If migrating from Beautiful Soup 3, review the porting guide for significant API changes.
breaking When migrating from Beautiful Soup 3 to Beautiful Soup 4, several attributes and methods were renamed for PEP 8 compliance. For example, `Tag.next` became `Tag.next_element`, and `Tag.previous` became `Tag.previous_element`. The primary import also changed from `from BeautifulSoup import BeautifulSoup` to `from bs4 import BeautifulSoup`. ↓
fix Consult the 'Porting code to BS4' section in the official documentation for a comprehensive list of changes. Update import statements and attribute/method calls accordingly.
gotcha Beautiful Soup relies on an underlying HTML/XML parser. While Python's built-in `html.parser` is the default, it is often less performant and more prone to issues with malformed HTML than `lxml` or `html5lib`. Not installing an external parser can lead to slower parsing, different parse trees, or crashes with certain documents. ↓
fix Install `lxml` and/or `html5lib` via `pip install lxml html5lib` for better performance and robustness. Always specify the parser explicitly (e.g., `BeautifulSoup(markup, 'lxml')` or `BeautifulSoup(markup, 'html5lib')`).
deprecated Starting with Beautiful Soup 4.13.0, many methods that were previously documented as deprecated now explicitly issue `DeprecationWarning` when used. These methods, including the `BeautifulStoneSoup` class and `parentGenerator`, are scheduled for removal in future versions (e.g., 4.15.0). ↓
fix Update your code to use the recommended, non-deprecated alternatives. Review the `DeprecationWarning` messages for specific guidance or consult the latest Beautiful Soup documentation.
gotcha For Beautiful Soup versions 4.13.0 and newer, type annotations are included directly within the `beautifulsoup4` package. If you were previously using the `types-beautifulsoup4` stub package for type checking, it can lead to conflicts or incorrect type resolution. ↓
fix If using Beautiful Soup 4.13.0 or newer, uninstall the `types-beautifulsoup4` package (`pip uninstall types-beautifulsoup4`). The built-in type hints are now sufficient.
breaking The `requests` library, which is commonly used for making HTTP requests in web scraping scripts, is not installed. Attempting to import `requests` without it being available in the environment will lead to a `ModuleNotFoundError`. ↓
fix Install the `requests` library using pip: `pip install requests`.
breaking The application failed because the 'requests' library was not found. While 'requests' is frequently used in conjunction with Beautiful Soup to fetch web pages, it is not a direct dependency of the 'beautifulsoup4' package itself. This error occurs if 'requests' is imported in the application script but was not installed alongside 'beautifulsoup4'. ↓
fix Install the 'requests' library using pip: `pip install requests`.
Install
pip install beautifulsoup4 lxml html5lib Install compatibility verified last tested: 2026-05-12
python os / libc status wheel install import disk
3.10 alpine (musl) - - 0.35s 19.1M
3.10 alpine (musl) - - 0.58s 32.4M
3.10 slim (glibc) - - 0.26s 20M
3.10 slim (glibc) - - 0.45s 33M
3.11 alpine (musl) - - 0.85s 21.2M
3.11 alpine (musl) - - 1.14s 34.6M
3.11 slim (glibc) - - 0.69s 22M
3.11 slim (glibc) - - 0.92s 35M
3.12 alpine (musl) - - 0.54s 13.0M
3.12 alpine (musl) - - 0.82s 26.5M
3.12 slim (glibc) - - 0.58s 13M
3.12 slim (glibc) - - 0.88s 27M
3.13 alpine (musl) - - 0.49s 12.6M
3.13 alpine (musl) - - 0.74s 26.2M
3.13 slim (glibc) - - 0.54s 13M
3.13 slim (glibc) - - 0.80s 27M
3.9 alpine (musl) - - 0.30s 18.6M
3.9 alpine (musl) - - 0.52s 31.8M
3.9 slim (glibc) - - 0.25s 19M
3.9 slim (glibc) - - 0.47s 32M
Imports
- BeautifulSoup wrong
from BeautifulSoup import BeautifulSoupcorrectfrom bs4 import BeautifulSoup
Quickstart stale last tested: 2026-04-23
import requests
from bs4 import BeautifulSoup
# Example HTML content
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
</body></html>
"""
# Create a BeautifulSoup object
soup = BeautifulSoup(html_doc, 'html.parser')
# Pretty-print the HTML
print("\n--- Pretty Printed HTML ---")
print(soup.prettify())
# Accessing tags
print("\n--- Page Title ---")
print(soup.title.string)
# Finding all links
print("\n--- All Links ---")
for link in soup.find_all('a'):
print(link.get('href'))
# Finding an element by ID
print("\n--- Link with ID 'link3' ---")
link3 = soup.find(id="link3")
if link3: # Check if link3 was found before accessing attributes
print(link3.get_text())
# Using CSS selectors (requires soupsieve, which is a dependency)
print("\n--- Paragraphs with class 'story' ---")
for p_tag in soup.select('p.story'):
print(p_tag.get_text(strip=True))