{"id":6872,"library":"scrapingbee","title":"ScrapingBee Python SDK","description":"ScrapingBee is a web scraping API that handles headless browsers and rotates proxies for you. The Python SDK simplifies interaction with this API, offering features like JavaScript rendering, proxy rotation, AI-powered data extraction, and screenshot capabilities. It is currently at version 2.0.2 and receives regular updates, focusing on reliability and new API features.","status":"active","version":"2.0.2","language":"en","source_language":"en","source_url":"https://github.com/scrapingbee/scrapingbee-python","tags":["web scraping","proxy","headless browser","api client","data extraction","javascript rendering"],"install":[{"cmd":"pip install scrapingbee","lang":"bash","label":"Install with pip"}],"dependencies":[{"reason":"The ScrapingBee Python SDK is a wrapper around the requests library, using it for HTTP communication.","package":"requests"}],"imports":[{"symbol":"ScrapingBeeClient","correct":"from scrapingbee import ScrapingBeeClient"}],"quickstart":{"code":"import os\nfrom scrapingbee import ScrapingBeeClient\n\n# It's highly recommended to store your API key in an environment variable\napi_key = os.environ.get('SCRAPINGBEE_API_KEY', 'YOUR_API_KEY')\n\nif api_key == 'YOUR_API_KEY':\n    print(\"Warning: Replace 'YOUR_API_KEY' or set the SCRAPINGBEE_API_KEY environment variable.\")\n\nclient = ScrapingBeeClient(api_key=api_key)\n\nurl_to_scrape = 'https://www.scrapingbee.com/blog/'\n\ntry:\n    response = client.get(\n        url_to_scrape,\n        params={\n            'render_js': True, # Set to False to save credits if JavaScript rendering is not needed\n            'extract_rules': {\n                'title': 'h1',\n                'subtitle': '#subtitle',\n                'articles': {'selector': 'article h2 a', 'type': 'list', 'output': 'text'}\n            }\n        }\n    )\n\n    if response.ok:\n        # If extract_rules are used, the content is usually JSON\n        if response.headers.get('content-type') == 'application/json':\n            import json\n            data = json.loads(response.content)\n            print(json.dumps(data, indent=2))\n        else:\n            # Otherwise, it's the raw HTML\n            print(response.text[:500]) # Print first 500 characters of HTML\n    else:\n        print(f\"Failed to scrape {url_to_scrape}: Status {response.status_code}, Content: {response.text[:200]}\")\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")","lang":"python","description":"This quickstart initializes the ScrapingBee client with an API key (preferably from an environment variable) and sends a GET request to a URL. It demonstrates using `extract_rules` to automatically parse specific data (title, subtitle, article headings) from the page into a JSON format."},"warnings":[{"fix":"Review your `params` dictionary and the target URLs to ensure they are correctly interpreted by the API's new encoding logic. Test existing scraping jobs thoroughly.","message":"Version 2.0.0 introduced a fix for URL encoding of parameters. While intended as a correction, this might alter the behavior for existing users who might have implicitly relied on or worked around previous (potentially incorrect) encoding, leading to different request URLs or parameter interpretation. [cite: GitHub release v2.0.0]","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Upgrade your Python environment to version 3.8 or newer. Python 3.6 has reached its end-of-life and is no longer supported by the Python core team.","message":"Python 3.6 support was officially dropped in version 1.2.0. Users running on Python 3.6 will encounter issues or be unable to upgrade past v1.1.8. [cite: GitHub release v1.2.0]","severity":"breaking","affected_versions":">=1.2.0"},{"fix":"Retrieve your API key from an environment variable (e.g., `os.environ.get('SCRAPINGBEE_API_KEY')`) rather than embedding it directly in your code. Ensure this variable is set in your deployment environment.","message":"Hardcoding your ScrapingBee API key directly into your scripts is a security risk. It should be stored securely, ideally in an environment variable.","severity":"gotcha","affected_versions":"All"},{"fix":"For pages that do not require JavaScript execution, set `render_js=False` in your `params` dictionary to save credits. E.g., `client.get(url, params={'render_js': False})`.","message":"By default, `render_js` is set to `True` for `client.get()` requests, which means JavaScript is executed and consumes 5 credits per request. For simple static HTML pages, this can unnecessarily increase credit usage.","severity":"gotcha","affected_versions":"All"},{"fix":"Implement proper concurrency management in your scraping logic (e.g., using `concurrent.futures.ThreadPoolExecutor` in Python) to respect your plan's concurrent request limit.","message":"ScrapingBee plans have limits on concurrent requests. Exceeding this limit can lead to requests being queued or failing.","severity":"gotcha","affected_versions":"All"},{"fix":"Regularly monitor your scrapers and adjust `extract_rules` as needed when website layouts change. Consider using AI-powered extraction or a combination of robust selectors and post-processing for critical data.","message":"While `extract_rules` are powerful, like any CSS/XPath selectors, they can break if the target website's HTML structure changes. The `v2.0.2` release specifically fixed handling of AI extract rules, indicating this is an area where issues can arise. [cite: GitHub release v2.0.2, 20]","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[]}