{"id":3233,"library":"pyquery","title":"PyQuery","description":"PyQuery is a Python library that provides a jQuery-like API for parsing and manipulating XML and HTML documents. It allows users to select, filter, and manipulate HTML elements using CSS selectors, simplifying web data extraction. As of April 2026, the current stable version is 2.0.1, with development actively continuing on GitHub.","status":"active","version":"2.0.1","language":"en","source_language":"en","source_url":"https://github.com/gawel/pyquery","tags":["web scraping","html parsing","xml parsing","jquery-like","css selectors"],"install":[{"cmd":"pip install pyquery","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Used for fast XML and HTML manipulation.","package":"lxml","optional":false},{"reason":"Required for CSS selector support.","package":"cssselect","optional":false}],"imports":[{"symbol":"PyQuery","correct":"from pyquery import PyQuery as pq"}],"quickstart":{"code":"from pyquery import PyQuery as pq\nimport requests\n\n# Load from a string\ndoc_string = pq('<html><body><div id=\"container\"><p class=\"item\">Hello</p><p class=\"item\">World</p></div></body></html>')\nprint(f\"From string: {doc_string('p.item:first').text()}\")\n\n# Load from a URL (using requests and explicitly passing content)\ndef fetch_url_content(url):\n    response = requests.get(url)\n    response.raise_for_status() # Raise an exception for HTTP errors\n    return response.content\n\ntry:\n    # Use a well-known public URL for demonstration\n    html_content = fetch_url_content(\"https://example.com\")\n    doc_url = pq(html_content)\n    print(f\"From URL title: {doc_url('title').text()}\")\n\n    # Select and iterate elements\n    for p_tag in doc_url('p'):\n        print(f\"Paragraph text: {pq(p_tag).text()}\")\nexcept requests.exceptions.RequestException as e:\n    print(f\"Error fetching URL: {e}\")\n\n# Manipulate elements\nhtml_to_manipulate = pq('<div><span class=\"foo\"></span></div>')\nhtml_to_manipulate('.foo').text('New Text')\nprint(f\"Manipulated HTML: {html_to_manipulate.html()}\")","lang":"python","description":"This quickstart demonstrates how to initialize PyQuery from a string and a URL (using the recommended `requests` library for fetching content), select elements using CSS selectors, and manipulate their text content."},"warnings":[{"fix":"Explicitly pass the `url` keyword argument, e.g., `PyQuery(url='http://example.com')`, or fetch content using a library like `requests` and pass the HTML string: `PyQuery(requests.get('http://example.com').content)`.","message":"In PyQuery 2.0.0, passing a URL directly to `PyQuery('http://example.com')` no longer fetches the URL's content. This was a breaking change from previous versions.","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Ensure you are using appropriate parsers for XML documents if you encounter issues, and upgrade your Python version to 3.8 or newer.","message":"As of PyQuery 2.0.1, it is reportedly no longer possible to use the HTML parser with an XML file, and this functionality is no longer tested. Additionally, support for Python 3.7 has been dropped.","severity":"breaking","affected_versions":">=2.0.1"},{"fix":"If preserving spacing is critical after removal, manual string manipulation or alternative DOM modification might be necessary.","message":"`PyQuery.remove()` no longer inserts a space in place of the removed element in versions 2.0.0 and above.","severity":"gotcha","affected_versions":">=2.0.0"},{"fix":"No direct fix needed, but be aware of the corrected behavior. Test any code that relies on the exact HTML output after this version.","message":"The behavior of `.html()` output regarding escaping of top-level element text was fixed in PyQuery 2.0.0. If you relied on previous escaping behavior, review your code.","severity":"gotcha","affected_versions":">=2.0.0"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}