PyQuery

2.0.1 · active · verified Sat Apr 11

PyQuery is a Python library that provides a jQuery-like API for parsing and manipulating XML and HTML documents. It allows users to select, filter, and manipulate HTML elements using CSS selectors, simplifying web data extraction. As of April 2026, the current stable version is 2.0.1, with development actively continuing on GitHub.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize PyQuery from a string and a URL (using the recommended `requests` library for fetching content), select elements using CSS selectors, and manipulate their text content.

from pyquery import PyQuery as pq
import requests

# Load from a string
doc_string = pq('<html><body><div id="container"><p class="item">Hello</p><p class="item">World</p></div></body></html>')
print(f"From string: {doc_string('p.item:first').text()}")

# Load from a URL (using requests and explicitly passing content)
def fetch_url_content(url):
    response = requests.get(url)
    response.raise_for_status() # Raise an exception for HTTP errors
    return response.content

try:
    # Use a well-known public URL for demonstration
    html_content = fetch_url_content("https://example.com")
    doc_url = pq(html_content)
    print(f"From URL title: {doc_url('title').text()}")

    # Select and iterate elements
    for p_tag in doc_url('p'):
        print(f"Paragraph text: {pq(p_tag).text()}")
except requests.exceptions.RequestException as e:
    print(f"Error fetching URL: {e}")

# Manipulate elements
html_to_manipulate = pq('<div><span class="foo"></span></div>')
html_to_manipulate('.foo').text('New Text')
print(f"Manipulated HTML: {html_to_manipulate.html()}")

view raw JSON →