Spider Cloud Python SDK

0.1.88 · active · verified Thu Apr 16

The `spider-client` is a Python SDK for integrating with the Spider Cloud API, providing tools for web scraping, large-scale crawling, link extraction, and taking screenshots. It is designed to efficiently collect data, often formatted for compatibility with Language Models (LLMs), leveraging a Rust-based engine optimized for AI that supports concurrent operations, streaming, and headless Chrome rendering. The library is actively maintained, with frequent updates, and the current version is 0.1.88.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart initializes the Spider client and performs a basic URL scrape. It demonstrates how to configure the API key, either via an environment variable or direct instantiation, and handles a simple scraping operation. Obtain your API key from spider.cloud.

import os
from spider_client import Spider

# Retrieve API key from environment variable or replace with your actual key
# Get an API key from https://spider.cloud
api_key = os.environ.get('SPIDER_API_KEY', 'YOUR_SPIDER_API_KEY')

if not api_key or api_key == 'YOUR_SPIDER_API_KEY':
    print("WARNING: SPIDER_API_KEY not set. Please set it as an environment variable or pass to Spider(api_key=...).\nSkipping API call.")
else:
    app = Spider(api_key=api_key)

    url_to_scrape = 'https://example.com'
    try:
        scraped_data = app.scrape_url(url_to_scrape)
        print(f"Successfully scraped data from {url_to_scrape}:")
        print(scraped_data)
    except Exception as e:
        print(f"An error occurred during scraping: {e}")

view raw JSON →