ZenRows Python SDK
ZenRows is a Python client for the ZenRows API, designed to simplify web scraping by automatically handling challenges like proxy rotation, CAPTCHA solving, and JavaScript rendering. It enables users to extract data from complex, anti-bot protected websites. The library is actively maintained, currently at version 1.4.0, with regular updates to its API and SDKs.
Common errors
-
coroutine 'main' was never awaited
cause Attempting to call an asynchronous method (e.g., `client.get_async`) without properly awaiting it or running it within an `asyncio` event loop.fixIf using `async` methods, define an `async def main():` function containing your async calls and run it with `asyncio.run(main())`. -
HTTP Error 429: Too Many Requests
cause You have exceeded your account's concurrent request limit or rate limit.fixImplement retry logic with exponential backoff (using the `retries` parameter) or reduce your concurrency. Check your ZenRows dashboard for plan limits. -
HTTP Error 422: Unprocessable Entity
cause The anti-bot protection on the target website is blocking your request. This often indicates ZenRows couldn't bypass the protection with the current parameters.fixEnable `js_render: true`, `premium_proxy: true`, or add a `referer` header. You might also need to use `wait_for` or `wait` parameters for dynamically loaded content. -
Incorrect Credentials / Connection Refused (when using direct proxy setup)
cause Using an incorrect API key, username, password, or proxy host/port when configuring proxies directly (not using the SDK client).fixVerify your ZenRows API key and proxy credentials from your dashboard. Ensure the proxy URL format is `http://username:password@proxy-host:port` and the protocol/port match (e.g., 1337 for HTTP, 1338 for HTTPS).
Warnings
- gotcha Retries are not active by default. You must explicitly specify the `retries` parameter when initializing `ZenRowsClient` if you want automatic retries for failed requests (e.g., 429, 5xx errors).
- gotcha When performing asynchronous requests, ensure you use `client.get_async` (or `post_async`, etc.) and run your async function with `asyncio.run()`, otherwise, coroutine errors will occur.
- gotcha Sending custom headers to the target URL might overwrite ZenRows' default headers, potentially leading to increased detection or blocking.
- gotcha Using the `css_extractor` parameter is often more efficient than fetching the entire HTML and then parsing it with libraries like BeautifulSoup, especially for large responses or limited bandwidth.
- gotcha The ZenRows API key is essential for authentication and usage tracking. Exposing it directly in code or public repositories is a security risk.
Install
-
pip install zenrows
Imports
- ZenRowsClient
import zenrows
from zenrows import ZenRowsClient
Quickstart
import os
from zenrows import ZenRowsClient
# Get your API key from environment variable for security
ZENROWS_API_KEY = os.environ.get('ZENROWS_API_KEY', 'YOUR_ZENROWS_API_KEY')
if not ZENROWS_API_KEY or ZENROWS_API_KEY == 'YOUR_ZENROWS_API_KEY':
print("Warning: ZENROWS_API_KEY environment variable not set or is default. Please replace with your actual API key.")
# Exit or raise error in production code
client = ZenRowsClient(ZENROWS_API_KEY)
# Make a GET request to a target URL with JavaScript rendering enabled
target_url = 'https://www.example.com'
response = client.get(target_url, params={'js_render': 'true', 'premium_proxy': 'true'})
if response.status_code == 200:
print(f"Successfully scraped {target_url}. Partial content:\n{response.text[:500]}...")
else:
print(f"Failed to scrape {target_url}. Status code: {response.status_code}, Error: {response.text}")