Spider Cloud Python SDK
The `spider-client` is a Python SDK for integrating with the Spider Cloud API, providing tools for web scraping, large-scale crawling, link extraction, and taking screenshots. It is designed to efficiently collect data, often formatted for compatibility with Language Models (LLMs), leveraging a Rust-based engine optimized for AI that supports concurrent operations, streaming, and headless Chrome rendering. The library is actively maintained, with frequent updates, and the current version is 0.1.88.
Common errors
-
ModuleNotFoundError: No module named 'spider_client'
cause The `spider-client` library is not installed in the current Python environment.fixInstall the package using pip: `pip install spider-client`. -
TypeError: 'NoneType' object is not callable
cause This often occurs if the `Spider` object was not correctly initialized, frequently due to a missing or invalid API key, leading to the `app` object being `None` or an uninitialized state.fixVerify that your `SPIDER_API_KEY` environment variable is correctly set or that you are passing a valid `api_key` string to the `Spider` constructor: `app = Spider(api_key='YOUR_API_KEY')`. -
json.JSONDecodeError: Expecting value: line X column Y (char Z)
cause This error typically indicates that the received data is not a valid, complete JSON document. This can happen when attempting to parse a partial stream or malformed response, especially during streaming operations or when the API returns an error message that isn't JSON.fixIf streaming, ensure your processing logic correctly handles chunks and potential non-JSON error responses. Use `stream=True` and iterate over content, handling `json.JSONDecodeError` for incomplete buffers. Example: iterate over `response.iter_content(chunk_size=...)` and accumulate before attempting `json.loads`. -
AttributeError: 'Spider' object has no attribute 'some_method_name'
cause You are attempting to call a method that does not exist or is misspelled on the `Spider` client instance.fixConsult the official `spider-client` documentation or GitHub repository to confirm the correct method names and their signatures (e.g., `scrape_url`, `crawl_url`, `links`). Check for typos in method calls.
Warnings
- breaking The `v0.1.37` release included a fix that 'removed pipe operator' (`fix(python): removed pipe operator`). This likely pertains to changes in type hinting syntax or internal handling of union types, which might affect compatibility with specific older Python versions or code relying on a previous internal implementation.
- gotcha The library requires an API key for authentication with the Spider Cloud API. Requests without a valid API key will fail with authentication errors.
- gotcha When dealing with large JSON responses or streaming data, direct `json.loads()` on the entire response might lead to `json.JSONDecodeError` due to incomplete data or excessive memory usage.
Install
-
pip install spider-client
Imports
- Spider
from spider_client import Spider
Quickstart
import os
from spider_client import Spider
# Retrieve API key from environment variable or replace with your actual key
# Get an API key from https://spider.cloud
api_key = os.environ.get('SPIDER_API_KEY', 'YOUR_SPIDER_API_KEY')
if not api_key or api_key == 'YOUR_SPIDER_API_KEY':
print("WARNING: SPIDER_API_KEY not set. Please set it as an environment variable or pass to Spider(api_key=...).\nSkipping API call.")
else:
app = Spider(api_key=api_key)
url_to_scrape = 'https://example.com'
try:
scraped_data = app.scrape_url(url_to_scrape)
print(f"Successfully scraped data from {url_to_scrape}:")
print(scraped_data)
except Exception as e:
print(f"An error occurred during scraping: {e}")