ScrapingBee Python SDK

2.0.2 · active · verified Wed Apr 15

ScrapingBee is a web scraping API that handles headless browsers and rotates proxies for you. The Python SDK simplifies interaction with this API, offering features like JavaScript rendering, proxy rotation, AI-powered data extraction, and screenshot capabilities. It is currently at version 2.0.2 and receives regular updates, focusing on reliability and new API features.

Warnings

Install

Imports

Quickstart

This quickstart initializes the ScrapingBee client with an API key (preferably from an environment variable) and sends a GET request to a URL. It demonstrates using `extract_rules` to automatically parse specific data (title, subtitle, article headings) from the page into a JSON format.

import os
from scrapingbee import ScrapingBeeClient

# It's highly recommended to store your API key in an environment variable
api_key = os.environ.get('SCRAPINGBEE_API_KEY', 'YOUR_API_KEY')

if api_key == 'YOUR_API_KEY':
    print("Warning: Replace 'YOUR_API_KEY' or set the SCRAPINGBEE_API_KEY environment variable.")

client = ScrapingBeeClient(api_key=api_key)

url_to_scrape = 'https://www.scrapingbee.com/blog/'

try:
    response = client.get(
        url_to_scrape,
        params={
            'render_js': True, # Set to False to save credits if JavaScript rendering is not needed
            'extract_rules': {
                'title': 'h1',
                'subtitle': '#subtitle',
                'articles': {'selector': 'article h2 a', 'type': 'list', 'output': 'text'}
            }
        }
    )

    if response.ok:
        # If extract_rules are used, the content is usually JSON
        if response.headers.get('content-type') == 'application/json':
            import json
            data = json.loads(response.content)
            print(json.dumps(data, indent=2))
        else:
            # Otherwise, it's the raw HTML
            print(response.text[:500]) # Print first 500 characters of HTML
    else:
        print(f"Failed to scrape {url_to_scrape}: Status {response.status_code}, Content: {response.text[:200]}")
except Exception as e:
    print(f"An error occurred: {e}")

view raw JSON →