Scrapinghub Python Client
raw JSON → 2.7.0 verified Mon Apr 27 auth: no python
Client interface for the Scrapinghub API, used to manage and monitor scraping jobs, collections, items, and more. Current version is 2.7.0, requires Python >=3.10. Release cadence is irregular.
pip install scrapinghub Common errors
error ModuleNotFoundError: No module named 'hubstorage' ↓
cause The hubstorage package is deprecated and merged into scrapinghub.
fix
pip install scrapinghub and change imports to 'from scrapinghub import ScrapinghubClient'.
error ImportError: cannot import name 'ScrapinghubClient' from 'scrapinghub.client' ↓
cause Incorrect import path; ScrapinghubClient is at the top-level package.
fix
from scrapinghub import ScrapinghubClient
error ScrapinghubError: ('Connection error: ...') ↓
cause Often due to invalid API key or network issues.
fix
Verify SH_APIKEY environment variable is set correctly and that you have internet access.
Warnings
deprecated The 'hubstorage' package is deprecated and merged into scrapinghub since version 1.9.0. Use 'scrapinghub' instead. ↓
fix Replace 'from hubstorage import HubstorageClient' with 'from scrapinghub import ScrapinghubClient'.
breaking Python 3.3 and 3.4 support dropped in version 2.3.1 and 2.0.0 respectively. ↓
fix Upgrade to Python 3.10+.
gotcha The API key can be provided via environment variable 'SH_APIKEY' or 'SHUB_JOBAUTH'. If both are missing, client initialization will fail. ↓
fix Set either SH_APIKEY or SHUB_JOBAUTH environment variable before creating a client.
gotcha Job IDs are strings in the format 'project_id/spider_id/job_id', not just numeric. ↓
fix Use full job key strings when referencing jobs in API calls.
Imports
- ScrapinghubClient wrong
from scrapinghub.client import ScrapinghubClientcorrectfrom scrapinghub import ScrapinghubClient
Quickstart
from scrapinghub import ScrapinghubClient
api_key = os.environ.get('SHUB_APIKEY', '')
client = ScrapinghubClient(api_key)
# Access a project
project = client.get_project(123456)
print(project.key)
# List jobs (spiders) in a project
for job in project.jobs.iter():
print(job['key'])