Pyppeteer
Pyppeteer is an unofficial Python port of Puppeteer, a Node.js library for controlling headless Chrome/Chromium. It provides a high-level API for browser automation tasks such as web scraping, automated testing, generating PDFs, and capturing screenshots, leveraging Python's asyncio for asynchronous operations. The current stable version is 2.0.0, though the project is considered to be in a maintenance phase with infrequent updates.
Warnings
- deprecated The Pyppeteer project is officially unmaintained, and the PyPI page explicitly recommends considering `playwright-python` as an alternative for new projects or those requiring active development and broader browser support.
- gotcha Pyppeteer relies on Python's `asyncio`. Forgetting to `await` asynchronous calls can lead to `Protocol error: Target closed`, `Execution context was destroyed`, non-deterministic behavior, or silent failures, especially during navigation or element interactions.
- gotcha Upon first execution, Pyppeteer automatically downloads a compatible Chromium binary (approx. 100-150MB). This can cause delays, consume bandwidth, and may fail in environments without internet access or with strict firewalls. The downloaded Chromium version might become outdated.
- gotcha Pyppeteer's `Page.evaluate()` method expects a JavaScript string, unlike the original Puppeteer which can accept raw JavaScript functions. While Pyppeteer attempts automatic detection, it may fail for expressions, leading to errors.
- gotcha The default navigation timeout for `page.goto()` and other navigation-related methods is 30 seconds. Pages with heavy JavaScript, slow network conditions, or complex rendering can exceed this, resulting in a `Navigation Timeout Exceeded` error.
Install
-
pip install pyppeteer -
pip install -U git+https://github.com/pyppeteer/pyppeteer@dev
Imports
- launch
from pyppeteer import launch
- $
await page.querySelector('.selector')
Quickstart
import asyncio
from pyppeteer import launch
import os
async def main():
# Launch the browser in headless mode by default
# Set headless=False to see the browser UI
browser = await launch(headless=True)
page = await browser.newPage()
await page.goto('https://example.com')
print(f"Page title: {await page.title()}")
await page.screenshot({'path': 'example.png'})
await browser.close()
if __name__ == '__main__':
asyncio.run(main())