{"id":7745,"library":"spider-client","title":"Spider Cloud Python SDK","description":"The `spider-client` is a Python SDK for integrating with the Spider Cloud API, providing tools for web scraping, large-scale crawling, link extraction, and taking screenshots. It is designed to efficiently collect data, often formatted for compatibility with Language Models (LLMs), leveraging a Rust-based engine optimized for AI that supports concurrent operations, streaming, and headless Chrome rendering. The library is actively maintained, with frequent updates, and the current version is 0.1.88.","status":"active","version":"0.1.88","language":"en","source_language":"en","source_url":"https://github.com/spider-rs/spider-clients/tree/main/python","tags":["web scraping","crawling","api client","ai","llm","data extraction","streaming"],"install":[{"cmd":"pip install spider-client","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Used for iterative parsing of large JSON streams, added in v0.1.37 to improve handling of substantial data payloads.","package":"ijson","optional":false}],"imports":[{"symbol":"Spider","correct":"from spider_client import Spider"}],"quickstart":{"code":"import os\nfrom spider_client import Spider\n\n# Retrieve API key from environment variable or replace with your actual key\n# Get an API key from https://spider.cloud\napi_key = os.environ.get('SPIDER_API_KEY', 'YOUR_SPIDER_API_KEY')\n\nif not api_key or api_key == 'YOUR_SPIDER_API_KEY':\n    print(\"WARNING: SPIDER_API_KEY not set. Please set it as an environment variable or pass to Spider(api_key=...).\\nSkipping API call.\")\nelse:\n    app = Spider(api_key=api_key)\n\n    url_to_scrape = 'https://example.com'\n    try:\n        scraped_data = app.scrape_url(url_to_scrape)\n        print(f\"Successfully scraped data from {url_to_scrape}:\")\n        print(scraped_data)\n    except Exception as e:\n        print(f\"An error occurred during scraping: {e}\")","lang":"python","description":"This quickstart initializes the Spider client and performs a basic URL scrape. It demonstrates how to configure the API key, either via an environment variable or direct instantiation, and handles a simple scraping operation. Obtain your API key from spider.cloud."},"warnings":[{"fix":"Ensure your Python environment is up-to-date (preferably Python 3.9+ for native pipe operator syntax) and review any custom type hint definitions that might conflict with the change. If issues persist, check the official GitHub for detailed migration guides.","message":"The `v0.1.37` release included a fix that 'removed pipe operator' (`fix(python): removed pipe operator`). This likely pertains to changes in type hinting syntax or internal handling of union types, which might affect compatibility with specific older Python versions or code relying on a previous internal implementation.","severity":"breaking","affected_versions":"<0.1.37"},{"fix":"Obtain an API key from spider.cloud and set it as an environment variable `SPIDER_API_KEY` or pass it directly to the `Spider` constructor: `app = Spider(api_key='YOUR_API_KEY')`.","message":"The library requires an API key for authentication with the Spider Cloud API. Requests without a valid API key will fail with authentication errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Utilize the `stream=True` parameter in methods like `crawl_url` and `scrape_url`, and process the response iteratively. The library internally uses `ijson` for efficient streaming. Follow examples for processing chunks or streaming directly.","message":"When dealing with large JSON responses or streaming data, direct `json.loads()` on the entire response might lead to `json.JSONDecodeError` due to incomplete data or excessive memory usage.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Install the package using pip: `pip install spider-client`.","cause":"The `spider-client` library is not installed in the current Python environment.","error":"ModuleNotFoundError: No module named 'spider_client'"},{"fix":"Verify that your `SPIDER_API_KEY` environment variable is correctly set or that you are passing a valid `api_key` string to the `Spider` constructor: `app = Spider(api_key='YOUR_API_KEY')`.","cause":"This often occurs if the `Spider` object was not correctly initialized, frequently due to a missing or invalid API key, leading to the `app` object being `None` or an uninitialized state.","error":"TypeError: 'NoneType' object is not callable"},{"fix":"If streaming, ensure your processing logic correctly handles chunks and potential non-JSON error responses. Use `stream=True` and iterate over content, handling `json.JSONDecodeError` for incomplete buffers. Example: iterate over `response.iter_content(chunk_size=...)` and accumulate before attempting `json.loads`.","cause":"This error typically indicates that the received data is not a valid, complete JSON document. This can happen when attempting to parse a partial stream or malformed response, especially during streaming operations or when the API returns an error message that isn't JSON.","error":"json.JSONDecodeError: Expecting value: line X column Y (char Z)"},{"fix":"Consult the official `spider-client` documentation or GitHub repository to confirm the correct method names and their signatures (e.g., `scrape_url`, `crawl_url`, `links`). Check for typos in method calls.","cause":"You are attempting to call a method that does not exist or is misspelled on the `Spider` client instance.","error":"AttributeError: 'Spider' object has no attribute 'some_method_name'"}]}