Scrapfly Python SDK
The Scrapfly Python SDK (current version 0.10.0) provides a robust interface to the Scrapfly API for web scraping, screenshot capture, AI-powered data extraction, and website crawling. It helps developers bypass anti-bot measures, manage proxies, render JavaScript, and integrates seamlessly with frameworks like Scrapy, LlamaIndex, and LangChain. The library maintains an active development and release cadence.
Common errors
-
AttributeError: 'ScrapeApiResponse' object has no attribute 'selector'
cause You are trying to use the `.selector` property on a `ScrapeApiResponse` object for HTML parsing, but the necessary optional dependencies (`parsel` or `scrapy`) have not been installed.fixInstall the `parser` extra: `pip install "scrapfly-sdk[parser]"`. If you are using Scrapy, install `pip install "scrapfly-sdk[scrapy]"` instead. -
scrapfly.errors.ScrapflyError: Invalid API key (HTTP 401 Unauthorized)
cause The Scrapfly API key provided to `ScrapflyClient` is either missing, incorrect, expired, or has insufficient permissions.fixVerify your API key from your Scrapfly dashboard (https://scrapfly.io/dashboard) and ensure it's correctly passed during client initialization, ideally from an environment variable. -
scrapfly.errors.ScrapflyError: Too Many Requests (HTTP 429)
cause Your Scrapfly account has exceeded its allocated request rate limit or concurrent request limit for the given time period.fixImplement exponential backoff and retry logic in your scraping code. Review your Scrapfly plan limits or consider upgrading your plan if sustained higher rates are needed. -
Scraped content is empty, incomplete, or shows an anti-bot page (no SDK error, but unexpected content).
cause The target website's anti-bot protection mechanisms successfully identified and blocked the scraping request, or the page content requires JavaScript rendering.fixIn your `ScrapeConfig`, enable `render_js=True`, `asp=True` (Anti-Scraping Protection), and potentially specify a `proxy_pool='public_residential_pool'` and `country='us'` (or relevant target country).
Warnings
- gotcha Accessing the `ScrapeApiResponse.selector` property for built-in HTML parsing requires installing either `parsel` or `scrapy` as an optional dependency (e.g., `pip install "scrapfly-sdk[parser]"`). Without these, attempting to use `.selector` will result in an `AttributeError`.
- gotcha Hardcoding your Scrapfly API key directly in your code is insecure and inflexible. It's best practice to retrieve it from environment variables or a secure configuration system.
- gotcha Scrapfly API errors (e.g., HTTP 400, 401, 429, 5xx) are encapsulated by `scrapfly.errors.ScrapflyError` subclasses. Incorrect handling or misinterpretation of these can lead to brittle scrapers. Consult the official Scrapfly error documentation for detailed explanations and suggested remedies.
- breaking While not explicitly documented as a breaking change for the Python SDK `0.10.0` specifically, other Scrapfly SDKs (e.g., TypeScript SDK v0.6.9) have undergone parameter renames (e.g., `ephemeral_template` to `extraction_ephemeral_template` in the Extraction API). Always review the official Changelog or release notes for potential API parameter changes when upgrading, especially across minor or major versions.
Install
-
pip install scrapfly-sdk -
pip install "scrapfly-sdk[all]"
Imports
- ScrapflyClient
from scrapfly import ScrapflyClient
- ScrapeConfig
from scrapfly import ScrapeConfig
Quickstart
import os
from scrapfly import ScrapflyClient, ScrapeConfig
SCRAPFLY_API_KEY = os.environ.get('SCRAPFLY_API_KEY', 'YOUR_SCRAPFLY_API_KEY')
async def main():
client = ScrapflyClient(key=SCRAPFLY_API_KEY)
try:
result = await client.scrape(ScrapeConfig(url='https://web-scraping.dev/product/1', render_js=True, country='us'))
print(f"Status: {result.status_code}")
print(f"Content length: {len(result.content)} bytes")
# If 'parsel' or 'scrapy' is installed, you can use .selector
# print(f"Product Title: {result.selector.css('h3::text').get()}")
except Exception as e:
print(f"An error occurred: {e}")
finally:
await client.close()
if __name__ == '__main__':
import asyncio
asyncio.run(main())