PyQuery

2.0.1 · active · verified Sat Apr 11

PyQuery is a Python library that provides a jQuery-like API for parsing and manipulating XML and HTML documents. It allows users to select, filter, and manipulate HTML elements using CSS selectors, simplifying web data extraction. As of April 2026, the current stable version is 2.0.1, with development actively continuing on GitHub.

Warnings

breaking In PyQuery 2.0.0, passing a URL directly to `PyQuery('http://example.com')` no longer fetches the URL's content. This was a breaking change from previous versions.
Fix: Explicitly pass the `url` keyword argument, e.g., `PyQuery(url='http://example.com')`, or fetch content using a library like `requests` and pass the HTML string: `PyQuery(requests.get('http://example.com').content)`.
breaking As of PyQuery 2.0.1, it is reportedly no longer possible to use the HTML parser with an XML file, and this functionality is no longer tested. Additionally, support for Python 3.7 has been dropped.
Fix: Ensure you are using appropriate parsers for XML documents if you encounter issues, and upgrade your Python version to 3.8 or newer.
gotcha `PyQuery.remove()` no longer inserts a space in place of the removed element in versions 2.0.0 and above.
Fix: If preserving spacing is critical after removal, manual string manipulation or alternative DOM modification might be necessary.
gotcha The behavior of `.html()` output regarding escaping of top-level element text was fixed in PyQuery 2.0.0. If you relied on previous escaping behavior, review your code.
Fix: No direct fix needed, but be aware of the corrected behavior. Test any code that relies on the exact HTML output after this version.

Install

pip install pyquery Install stable version

Imports

PyQuery
```
from pyquery import PyQuery as pq
```

Quickstart

This quickstart demonstrates how to initialize PyQuery from a string and a URL (using the recommended `requests` library for fetching content), select elements using CSS selectors, and manipulate their text content.

from pyquery import PyQuery as pq
import requests

# Load from a string
doc_string = pq('<html><body><div id="container"><p class="item">Hello</p><p class="item">World</p></div></body></html>')
print(f"From string: {doc_string('p.item:first').text()}")

# Load from a URL (using requests and explicitly passing content)
def fetch_url_content(url):
    response = requests.get(url)
    response.raise_for_status() # Raise an exception for HTTP errors
    return response.content

try:
    # Use a well-known public URL for demonstration
    html_content = fetch_url_content("https://example.com")
    doc_url = pq(html_content)
    print(f"From URL title: {doc_url('title').text()}")

    # Select and iterate elements
    for p_tag in doc_url('p'):
        print(f"Paragraph text: {pq(p_tag).text()}")
except requests.exceptions.RequestException as e:
    print(f"Error fetching URL: {e}")

# Manipulate elements
html_to_manipulate = pq('<div><span class="foo"></span></div>')
html_to_manipulate('.foo').text('New Text')
print(f"Manipulated HTML: {html_to_manipulate.html()}")

view raw JSON →