PyQuery
PyQuery is a Python library that provides a jQuery-like API for parsing and manipulating XML and HTML documents. It allows users to select, filter, and manipulate HTML elements using CSS selectors, simplifying web data extraction. As of April 2026, the current stable version is 2.0.1, with development actively continuing on GitHub.
Warnings
- breaking In PyQuery 2.0.0, passing a URL directly to `PyQuery('http://example.com')` no longer fetches the URL's content. This was a breaking change from previous versions.
- breaking As of PyQuery 2.0.1, it is reportedly no longer possible to use the HTML parser with an XML file, and this functionality is no longer tested. Additionally, support for Python 3.7 has been dropped.
- gotcha `PyQuery.remove()` no longer inserts a space in place of the removed element in versions 2.0.0 and above.
- gotcha The behavior of `.html()` output regarding escaping of top-level element text was fixed in PyQuery 2.0.0. If you relied on previous escaping behavior, review your code.
Install
-
pip install pyquery
Imports
- PyQuery
from pyquery import PyQuery as pq
Quickstart
from pyquery import PyQuery as pq
import requests
# Load from a string
doc_string = pq('<html><body><div id="container"><p class="item">Hello</p><p class="item">World</p></div></body></html>')
print(f"From string: {doc_string('p.item:first').text()}")
# Load from a URL (using requests and explicitly passing content)
def fetch_url_content(url):
response = requests.get(url)
response.raise_for_status() # Raise an exception for HTTP errors
return response.content
try:
# Use a well-known public URL for demonstration
html_content = fetch_url_content("https://example.com")
doc_url = pq(html_content)
print(f"From URL title: {doc_url('title').text()}")
# Select and iterate elements
for p_tag in doc_url('p'):
print(f"Paragraph text: {pq(p_tag).text()}")
except requests.exceptions.RequestException as e:
print(f"Error fetching URL: {e}")
# Manipulate elements
html_to_manipulate = pq('<div><span class="foo"></span></div>')
html_to_manipulate('.foo').text('New Text')
print(f"Manipulated HTML: {html_to_manipulate.html()}")