Scrapinghub Python Client

raw JSON →
2.7.0 verified Mon Apr 27 auth: no python

Client interface for the Scrapinghub API, used to manage and monitor scraping jobs, collections, items, and more. Current version is 2.7.0, requires Python >=3.10. Release cadence is irregular.

pip install scrapinghub
error ModuleNotFoundError: No module named 'hubstorage'
cause The hubstorage package is deprecated and merged into scrapinghub.
fix
pip install scrapinghub and change imports to 'from scrapinghub import ScrapinghubClient'.
error ImportError: cannot import name 'ScrapinghubClient' from 'scrapinghub.client'
cause Incorrect import path; ScrapinghubClient is at the top-level package.
fix
from scrapinghub import ScrapinghubClient
error ScrapinghubError: ('Connection error: ...')
cause Often due to invalid API key or network issues.
fix
Verify SH_APIKEY environment variable is set correctly and that you have internet access.
deprecated The 'hubstorage' package is deprecated and merged into scrapinghub since version 1.9.0. Use 'scrapinghub' instead.
fix Replace 'from hubstorage import HubstorageClient' with 'from scrapinghub import ScrapinghubClient'.
breaking Python 3.3 and 3.4 support dropped in version 2.3.1 and 2.0.0 respectively.
fix Upgrade to Python 3.10+.
gotcha The API key can be provided via environment variable 'SH_APIKEY' or 'SHUB_JOBAUTH'. If both are missing, client initialization will fail.
fix Set either SH_APIKEY or SHUB_JOBAUTH environment variable before creating a client.
gotcha Job IDs are strings in the format 'project_id/spider_id/job_id', not just numeric.
fix Use full job key strings when referencing jobs in API calls.

Initialize client with API key, fetch a project, and list jobs.

from scrapinghub import ScrapinghubClient

api_key = os.environ.get('SHUB_APIKEY', '')
client = ScrapinghubClient(api_key)

# Access a project
project = client.get_project(123456)
print(project.key)

# List jobs (spiders) in a project
for job in project.jobs.iter():
    print(job['key'])