Sickle

raw JSON →
0.7.0 verified Fri May 01 auth: no python

A lightweight OAI (Open Archives Initiative) client library for Python, designed to harvest metadata from OAI-PMH compliant repositories. Current version is 0.7.0, with maintenance releases as needed.

pip install sickle
error AttributeError: module 'sickle' has no attribute 'OAIResponse'
cause Trying to use 'sickle.OAIResponse' directly, but it's not a top-level attribute unless explicitly imported.
fix
Use 'from sickle import OAIResponse' or access it as 'sickle.oaipmh.OAIResponse' (deprecated).
error requests.exceptions.SSLError: HTTPSConnectionPool(host='...', port=443): Max retries exceeded with url: ... (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed')))
cause The OAI endpoint uses a self-signed or invalid SSL certificate.
fix
If you trust the endpoint, pass 'verify=False' to Sickle constructor (not recommended in production). For production, use proper certificates.
breaking In v0.7.0, the 'OAIResponse' class moved from 'sickle.oaipmh' to top-level 'sickle'. Old imports will break.
fix Change imports from 'from sickle.oaipmh import OAIResponse' to 'from sickle import OAIResponse'.
gotcha The 'ListRecords' iterator internally handles resumption tokens automatically. However, if you manually iterate and break early, the underlying HTTP session may not be cleanly closed. Always use context managers or iterate fully.
fix Wrap usage in a 'with Sickle(...) as app:' context manager to ensure cleanup.
deprecated The 'max_retries' parameter is deprecated in favor of 'retry_status_codes' and 'retry_backoff_factor' via requests adapter.
fix Use 'from requests.adapters import HTTPAdapter' to configure retries.

Basic setup: create a Sickle instance pointing to an OAI-PMH base URL, then iterate over records. Use 'verify=False' for self-signed certs (not recommended).

from sickle import Sickle

sickle = Sickle('https://api.example.com/oai', verify=True)
records = sickle.ListRecords(metadataPrefix='oai_dc')
for record in records:
    print(record.header.identifier, record.metadata.get('title', ''))