python-documentcloud

4.5.0 · active · verified Thu Apr 16

python-documentcloud is a simple Python wrapper for the DocumentCloud API (current version 4.5.0). It provides convenient methods to retrieve and edit documents and projects, both public and private, directly from documentcloud.org. Users can upload PDFs into their DocumentCloud account, organize them into projects, and download extracted text and images. The library is actively maintained by MuckRock and sees a monthly to quarterly release cadence for updates and new features.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the DocumentCloud client and perform a basic search for documents. Authentication is handled via environment variables (`DC_USERNAME`, `DC_PASSWORD`) for secure credential management. It then iterates through the search results and fetches a specific document by ID.

import os
from documentcloud import DocumentCloud

# Authenticate using environment variables for security
USERNAME = os.environ.get('DC_USERNAME', '')
PASSWORD = os.environ.get('DC_PASSWORD', '')

try:
    # Initialize the client. For private documents/actions, provide credentials.
    # For public documents, no credentials are required.
    client = DocumentCloud(USERNAME, PASSWORD)

    # Search for documents
    query = 'MuckRock'
    print(f"Searching for documents with query: '{query}'")
    documents = client.documents.search(query)

    if documents:
        print(f"Found {len(documents)} documents:")
        for doc in documents:
            print(f"  - ID: {doc.id}, Title: {doc.title}, Status: {doc.status}")

        # Access a specific document by ID (replace with a real ID)
        first_doc_id = documents[0].id
        doc = client.documents.get(first_doc_id)
        print(f"\nRetrieved document ID {doc.id}: '{doc.title}'")
        print(f"  Source: {doc.source}")
    else:
        print("No documents found for the given query.")

except Exception as e:
    print(f"An error occurred: {e}")
    print("Ensure DC_USERNAME and DC_PASSWORD environment variables are set if accessing private data.")

view raw JSON →