{"id":8626,"library":"scrapegraph-py","title":"ScrapeGraph Python SDK","description":"ScrapeGraph Python SDK (version 1.46.0) is the official client for the ScrapeGraphAI API. It enables AI-powered web scraping, search, crawling, and structured data extraction using natural language prompts. The library focuses on abstracting away complexities like proxy management and JavaScript rendering, offering both synchronous and asynchronous clients. It maintains an active and iterative development cadence with frequent updates.","status":"active","version":"1.46.0","language":"en","source_language":"en","source_url":"https://github.com/ScrapeGraphAI/scrapegraph-py","tags":["web scraping","AI","LLM","data extraction","API client","structured data","pydantic"],"install":[{"cmd":"pip install scrapegraph-py","lang":"bash","label":"Core Installation"},{"cmd":"pip install scrapegraph-py[html]","lang":"bash","label":"With HTML Validation (for website_html parameter)"},{"cmd":"pip install scrapegraph-py[langchain]","lang":"bash","label":"With Langchain Integration"}],"dependencies":[{"reason":"Used for structured output with schemas, ensuring type safety.","package":"pydantic","optional":false},{"reason":"Required for token calculation when using local LLMs; missing can lead to 'Model not found' errors.","package":"transformers","optional":true},{"reason":"Recommended for handling dynamic, JavaScript-heavy websites if not using the API's built-in rendering.","package":"playwright","optional":true}],"imports":[{"note":"The primary client class for interacting with the ScrapeGraphAI API. 'ScrapeGraphClient' might be from an older API version or another related library in the ecosystem.","wrong":"from scrapegraph_py import ScrapeGraphClient","symbol":"Client","correct":"from scrapegraph_py import Client"},{"note":"Used for filtering search results by date range in SearchScraper.","symbol":"TimeRange","correct":"from scrapegraph_py.models import TimeRange"}],"quickstart":{"code":"import os\nfrom scrapegraph_py import Client\nfrom pydantic import BaseModel, Field\n\n# Set your ScrapeGraph AI API key\n# It's recommended to set this as an environment variable: SGAI_API_KEY\n# For quick testing, you can pass it directly or use os.environ.get\napi_key = os.environ.get('SGAI_API_KEY', 'your_scrapegraph_api_key_here')\n\nif not api_key or api_key == 'your_scrapegraph_api_key_here':\n    print(\"Warning: Please set your SGAI_API_KEY environment variable or replace 'your_scrapegraph_api_key_here' with your actual API key.\")\n    exit()\n\nclient = Client(api_key=api_key)\n\nclass ArticleData(BaseModel):\n    title: str = Field(description=\"The article title\")\n    author: str = Field(description=\"The author's name\")\n    publish_date: str = Field(description=\"Article publication date\")\n    content: str = Field(description=\"Main article content\")\n\ntry:\n    # Use SmartScraper to extract structured data from a webpage\n    response = client.smartscraper(\n        website_url=\"https://example.com/blog/article-example\",\n        user_prompt=\"Extract the article information\",\n        output_schema=ArticleData\n    )\n\n    print(f\"Title: {response.title}\")\n    print(f\"Author: {response.author}\")\n    print(f\"Published: {response.publish_date}\")\n    print(f\"Content snippet: {response.content[:100]}...\")\n\nfinally:\n    # Always close the client connection\n    client.close()\n","lang":"python","description":"This quickstart demonstrates how to initialize the ScrapeGraph client using an API key (preferably from an environment variable) and then use the `smartscraper` service to extract structured data from a webpage. It utilizes a Pydantic `BaseModel` to define the desired output schema for robust data validation and type safety. The client should always be closed after use."},"warnings":[{"fix":"Migrate usage from standalone functions like `smart_scraper(client, ...)` to methods on the `Client` instance, e.g., `client.smartscraper(...)`. Refer to the latest official documentation for current API patterns.","message":"The ScrapeGraph ecosystem has seen API changes, particularly with the transition to 'v2 API surface' in related projects like Scrapegraph-ai. While `scrapegraph-py` (the SDK) strives for stability, older code using `ScrapeGraphClient` functions (e.g., `smart_scraper(client, url, prompt)`) may need to be updated to the `Client` class methods (e.g., `client.smartscraper(website_url, user_prompt)`).","severity":"breaking","affected_versions":"<1.4.x (potentially)"},{"fix":"Ensure `SGAI_API_KEY` is set in your environment variables or passed directly to the `Client` constructor: `client = Client(api_key=\"your-api-key-here\")`. Verify the API key's validity on the ScrapeGraphAI Dashboard.","message":"Failing to set the `SGAI_API_KEY` environment variable or providing an invalid API key will result in authentication errors (HTTP 401 Unauthorized) or 'Insufficient credits' errors. The client will not be able to perform API calls.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure `transformers` is installed (`pip install transformers`). If configuring local LLMs, verify the `model_tokens` in your configuration dictionary is correctly set or remove it to use defaults. Consult the ScrapeGraphAI documentation for specific local LLM configurations.","message":"When using local LLMs with certain functionalities, a 'Model not found, using default token size (8192)' error or an `ImportError: Could not import transformers python package` might occur. This indicates issues with LLM configuration or missing dependencies.","severity":"gotcha","affected_versions":"All versions when using local LLMs"},{"fix":"Implement rate limiting and delays between requests. Always check `robots.txt` and a website's terms of service before scraping. The SDK offers automatic retries for certain errors, but manual retry logic with backoff can also be implemented.","message":"Running web scraping operations too frequently or without adhering to website policies (e.g., `robots.txt`, terms of service) can lead to IP blocking (HTTP 429 Too Many Requests) or other service unavailability errors (HTTP 500, 503).","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Rename your Python script file to something different, e.g., `my_scraper.py`.","cause":"The Python file containing your code has the same name as the package (`scrapegraph_py.py` or `scrapegraphai.py`), causing Python to import your local file instead of the installed library.","error":"ModuleNotFoundError: No module named 'scrapegraph_py'"},{"fix":"Avoid direct imports of specific LLM classes like `OpenAI` from `scrapegraphai.models`. Instead, configure the desired LLM (e.g., OpenAI, Gemini) through the ScrapeGraphAI platform's API key. The `scrapegraph-py` SDK interacts with the ScrapeGraphAI API, which then manages the underlying LLMs.","cause":"Attempting to import `OpenAI` directly from `scrapegraphai.models` when it's not exposed or has been moved/renamed in the current SDK version. The SDK typically handles LLM integration internally via the API key.","error":"ImportError: cannot import name 'OpenAI' from 'scrapegraphai.models'"},{"fix":"Double-check your API key for typos. Ensure the `SGAI_API_KEY` environment variable is correctly set or that `api_key` is passed with a valid key to `Client(api_key='...')`. Obtain a new key from the ScrapeGraphAI Dashboard if necessary.","cause":"The provided API key is incorrect, expired, or missing. The `Client` could not authenticate with the ScrapeGraphAI service.","error":"scrapegraph_py.exceptions.APIError: Invalid API key"},{"fix":"Verify that the `website_url` string is a complete and correctly formatted URL, including the scheme (e.g., 'https://example.com').","cause":"The `website_url` parameter passed to a scraping method (e.g., `smartscraper`) is malformed or not a valid URL.","error":"scrapegraph_py.exceptions.APIError: Invalid URL format"}]}