{"id":5456,"library":"requests-html","title":"requests-html","description":"requests-html is a Python library designed for web scraping and HTML parsing, extending the capabilities of the popular `requests` library. It provides HTML parsing with CSS selectors (jQuery-style) and XPath, automatic encoding detection, mocked user-agents, and crucially, full JavaScript support via Headless Chromium (Pyppeteer). The current version is 0.10.0, with its last PyPI release in February 2019, suggesting a slower release cadence, though the underlying `requests` library is actively maintained.","status":"active","version":"0.10.0","language":"en","source_language":"en","source_url":"https://github.com/kennethreitz/requests-html","tags":["web scraping","html parsing","requests","css selectors","xpath","javascript rendering","async"],"install":[{"cmd":"pip install requests-html","lang":"bash","label":"Standard install"},{"cmd":"pip install requests-html[async]","lang":"bash","label":"Install with async support"}],"dependencies":[{"reason":"Core HTTP request functionality.","package":"requests","optional":false},{"reason":"CSS selector support (jQuery-style).","package":"pyquery","optional":false},{"reason":"Efficient HTML/XML parsing backend.","package":"lxml","optional":false},{"reason":"Required for JavaScript rendering (downloads Chromium on first use).","package":"pyppeteer","optional":true},{"reason":"URL parsing and encoding detection (integrated for robustness).","package":"w3lib","optional":false}],"imports":[{"symbol":"HTMLSession","correct":"from requests_html import HTMLSession"},{"note":"Common mistake to over-specify the module path.","wrong":"from requests_html.requests_html import AsyncHTMLSession","symbol":"AsyncHTMLSession","correct":"from requests_html import AsyncHTMLSession"}],"quickstart":{"code":"from requests_html import HTMLSession\n\nsession = HTMLSession()\nr = session.get('https://www.python.org/')\n\n# Extract title using CSS selector\ntitle = r.html.find('title', first=True).text\nprint(f\"Page title: {title}\")\n\n# Extract all absolute links\nprint(\"Absolute links:\")\nfor link in r.html.absolute_links:\n    if 'docs' in link:\n        print(link)\n\n# Example for JavaScript rendering (requires pyppeteer and Chromium)\n# To run this, ensure pyppeteer is installed and Chromium is downloaded.\n# r_js = session.get('https://pyppeteer.github.io/')\n# r_js.html.render(sleep=1)\n# js_content = r_js.html.find('#example-id', first=True).text\n# print(f\"JS rendered content: {js_content}\")\n\nsession.close()","lang":"python","description":"Demonstrates basic synchronous HTML retrieval and parsing using CSS selectors and extracting absolute links. An commented-out example shows how to use JavaScript rendering, which requires Pyppeteer and will download Chromium on its first invocation."},"warnings":[{"fix":"Be aware of the initial setup time and disk usage. Ensure sufficient network access for the Chromium download.","message":"JavaScript rendering (using `r.html.render()`) requires `pyppeteer` and will automatically download a Chromium browser into your home directory the first time it's invoked. This can take some time and consume disk space.","severity":"gotcha","affected_versions":"0.10.0 and earlier"},{"fix":"Consider checking the GitHub repository for the latest development if encountering issues not resolved in the PyPI version. For very cutting-edge web scraping needs, alternatives might offer more active development.","message":"The `requests-html` library's latest release on PyPI is from February 2019. While functional, it might not receive frequent updates compared to its core dependency `requests`. Community contributions via GitHub are ongoing, but new features or critical bug fixes may not be immediately released to PyPI.","severity":"gotcha","affected_versions":"0.10.0 and earlier"},{"fix":"Install with `pip install requests-html[async]`. When using `asession.run()`, do not assume the order of results matches the input order of coroutines.","message":"Asynchronous support (`AsyncHTMLSession`) requires Python 3.6+ and the `requests-html[async]` installation. The `.run()` method for `AsyncHTMLSession` executes coroutines and its results list order reflects the completion order, not the order coroutines were passed.","severity":"gotcha","affected_versions":"0.10.0 and earlier"},{"fix":"Ensure `requests-html` is installed in a Python 3.6+ environment. Report any compatibility issues with newer Python versions to the project's GitHub.","message":"Older documentation and some historical context indicated stricter Python 3.6 support. While `requests-html` generally functions with newer Python 3 versions (e.g., 3.7+), direct compatibility guarantees were historically tied to 3.6. Always test thoroughly with your specific Python version.","severity":"breaking","affected_versions":"Potentially older sub-versions (pre-0.10.0) or specific environments"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}