requests-html

0.10.0 · active · verified Mon Apr 13

requests-html is a Python library designed for web scraping and HTML parsing, extending the capabilities of the popular `requests` library. It provides HTML parsing with CSS selectors (jQuery-style) and XPath, automatic encoding detection, mocked user-agents, and crucially, full JavaScript support via Headless Chromium (Pyppeteer). The current version is 0.10.0, with its last PyPI release in February 2019, suggesting a slower release cadence, though the underlying `requests` library is actively maintained.

Warnings

Install

Imports

Quickstart

Demonstrates basic synchronous HTML retrieval and parsing using CSS selectors and extracting absolute links. An commented-out example shows how to use JavaScript rendering, which requires Pyppeteer and will download Chromium on its first invocation.

from requests_html import HTMLSession

session = HTMLSession()
r = session.get('https://www.python.org/')

# Extract title using CSS selector
title = r.html.find('title', first=True).text
print(f"Page title: {title}")

# Extract all absolute links
print("Absolute links:")
for link in r.html.absolute_links:
    if 'docs' in link:
        print(link)

# Example for JavaScript rendering (requires pyppeteer and Chromium)
# To run this, ensure pyppeteer is installed and Chromium is downloaded.
# r_js = session.get('https://pyppeteer.github.io/')
# r_js.html.render(sleep=1)
# js_content = r_js.html.find('#example-id', first=True).text
# print(f"JS rendered content: {js_content}")

session.close()

view raw JSON →