Scrapfly Python SDK

0.10.0 · active · verified Thu Apr 16

The Scrapfly Python SDK (current version 0.10.0) provides a robust interface to the Scrapfly API for web scraping, screenshot capture, AI-powered data extraction, and website crawling. It helps developers bypass anti-bot measures, manage proxies, render JavaScript, and integrates seamlessly with frameworks like Scrapy, LlamaIndex, and LangChain. The library maintains an active development and release cadence.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the Scrapfly client and perform a basic scrape request to a test page. It shows how to enable JavaScript rendering and specify a proxy country. Remember to replace 'YOUR_SCRAPFLY_API_KEY' with your actual key or set the SCRAPFLY_API_KEY environment variable. For HTML parsing with `.selector`, ensure `parsel` or `scrapy` is installed as an extra dependency.

import os
from scrapfly import ScrapflyClient, ScrapeConfig

SCRAPFLY_API_KEY = os.environ.get('SCRAPFLY_API_KEY', 'YOUR_SCRAPFLY_API_KEY')

async def main():
    client = ScrapflyClient(key=SCRAPFLY_API_KEY)
    try:
        result = await client.scrape(ScrapeConfig(url='https://web-scraping.dev/product/1', render_js=True, country='us'))
        print(f"Status: {result.status_code}")
        print(f"Content length: {len(result.content)} bytes")
        # If 'parsel' or 'scrapy' is installed, you can use .selector
        # print(f"Product Title: {result.selector.css('h3::text').get()}")
    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        await client.close()

if __name__ == '__main__':
    import asyncio
    asyncio.run(main())

view raw JSON →