Beautiful Soup 4

4.14.3 · active · verified Fri Mar 27

HTML and XML parsing library. Current version is 4.14.3. Install name is beautifulsoup4 (pip install beautifulsoup4), import name is bs4 (from bs4 import BeautifulSoup). Always specify a parser explicitly — omitting it causes a UserWarning and inconsistent cross-platform behavior.

Warnings

Install

Imports

Quickstart

Basic parsing. Use html.parser (built-in), lxml (fast), or html5lib (lenient).

from bs4 import BeautifulSoup
import requests

response = requests.get('https://example.com')
soup = BeautifulSoup(response.text, 'html.parser')

# Find elements
title = soup.find('title').get_text()       # first match
links = soup.find_all('a', href=True)       # all <a> with href
divs = soup.select('div.content > p')       # CSS selectors via soupsieve

# Navigate
body = soup.body
first_p = soup.body.p
parent = first_p.parent

# Text extraction
text = soup.get_text(separator=' ', strip=True)

# Find with attributes
button = soup.find('button', {'class': 'submit', 'type': 'submit'})

# Find by string content (NOT text= — that's deprecated)
heading = soup.find('h1', string='Welcome')

view raw JSON →