{"id":9255,"library":"python-documentcloud","title":"python-documentcloud","description":"python-documentcloud is a simple Python wrapper for the DocumentCloud API (current version 4.5.0). It provides convenient methods to retrieve and edit documents and projects, both public and private, directly from documentcloud.org. Users can upload PDFs into their DocumentCloud account, organize them into projects, and download extracted text and images. The library is actively maintained by MuckRock and sees a monthly to quarterly release cadence for updates and new features.","status":"active","version":"4.5.0","language":"en","source_language":"en","source_url":"https://github.com/muckrock/python-documentcloud","tags":["documentcloud","api-wrapper","journalism","data-journalism","pdf","documents"],"install":[{"cmd":"pip install python-documentcloud","lang":"bash","label":"Install latest version"}],"dependencies":[],"imports":[{"note":"The primary `DocumentCloud` client class is imported directly from the `documentcloud` package. The top-level `documentcloud` module itself is not typically imported as a whole.","wrong":"import documentcloud","symbol":"DocumentCloud","correct":"from documentcloud import DocumentCloud"},{"note":"Common exceptions for API interactions are found within the `documentcloud.exceptions` submodule.","symbol":"APIError","correct":"from documentcloud.exceptions import APIError"}],"quickstart":{"code":"import os\nfrom documentcloud import DocumentCloud\n\n# Authenticate using environment variables for security\nUSERNAME = os.environ.get('DC_USERNAME', '')\nPASSWORD = os.environ.get('DC_PASSWORD', '')\n\ntry:\n    # Initialize the client. For private documents/actions, provide credentials.\n    # For public documents, no credentials are required.\n    client = DocumentCloud(USERNAME, PASSWORD)\n\n    # Search for documents\n    query = 'MuckRock'\n    print(f\"Searching for documents with query: '{query}'\")\n    documents = client.documents.search(query)\n\n    if documents:\n        print(f\"Found {len(documents)} documents:\")\n        for doc in documents:\n            print(f\"  - ID: {doc.id}, Title: {doc.title}, Status: {doc.status}\")\n\n        # Access a specific document by ID (replace with a real ID)\n        first_doc_id = documents[0].id\n        doc = client.documents.get(first_doc_id)\n        print(f\"\\nRetrieved document ID {doc.id}: '{doc.title}'\")\n        print(f\"  Source: {doc.source}\")\n    else:\n        print(\"No documents found for the given query.\")\n\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")\n    print(\"Ensure DC_USERNAME and DC_PASSWORD environment variables are set if accessing private data.\")\n","lang":"python","description":"This quickstart demonstrates how to initialize the DocumentCloud client and perform a basic search for documents. Authentication is handled via environment variables (`DC_USERNAME`, `DC_PASSWORD`) for secure credential management. It then iterates through the search results and fetches a specific document by ID."},"warnings":[{"fix":"Ensure your project runs on Python 3.8 or newer. Upgrade your Python environment if necessary.","message":"Python 2 support was dropped starting with version 4.0.0. Earlier versions (3.x and below) supported Python 2 and 3.","severity":"breaking","affected_versions":">=4.0.0"},{"fix":"Iterate through API results directly or use cursor parameters for paging. Avoid relying on `len()` for result sets or direct page number access.","message":"The API pagination mechanism changed from page number-based to cursor-based in version 3.0.0. This means the `__len__` method is no longer implemented for `APIResults`, and you cannot randomly access pages by number. Iteration is the primary method for processing results.","severity":"breaking","affected_versions":">=3.0.0"},{"fix":"Always install `python-documentcloud` via `pip install python-documentcloud`. If `documentcloud` is already installed, uninstall it first: `pip uninstall documentcloud`.","message":"The PyPI package `documentcloud` (without the 'python-' prefix) is deprecated and refers to an older, unmaintained version of the library. Installing this package will lead to outdated functionality and potential compatibility issues.","severity":"gotcha","affected_versions":"All versions, if wrong package is installed"},{"fix":"After uploading, periodically call `document.refresh()` and check `document.status` and `document.public` in a loop until the document is fully processed and reflects the intended status.","message":"When uploading a new document, its status will initially be 'pending' or 'private' even if marked 'public', due to server-side processing. Attempts to interact with full metadata or public status immediately after upload may show stale data.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"First, uninstall the incorrect package: `pip uninstall documentcloud`. Then, install the correct one: `pip install python-documentcloud`. Ensure your import statement is `from documentcloud import DocumentCloud`.","cause":"This typically occurs because the deprecated `documentcloud` PyPI package was installed instead of the correct `python-documentcloud` package, or a mix-up in import paths.","error":"ImportError: cannot import name 'DocumentCloud' from 'documentcloud'"},{"fix":"Verify that your `DC_USERNAME` and `DC_PASSWORD` environment variables are correctly set, or that the credentials passed directly to `DocumentCloud()` are accurate for a valid DocumentCloud account with API access.","cause":"The username or password provided to the `DocumentCloud` client constructor (or via environment variables) is incorrect or lacks the necessary permissions.","error":"documentcloud.exceptions.CredentialsFailedError: Unable to obtain an access token due to bad login credentials"},{"fix":"Ensure that the identifier used is truly unique (e.g., a DocumentCloud numerical ID). If searching, use `client.documents.search()` which is designed to return multiple results, and then process the list.","cause":"You used a method or query that expects a single, unique result (e.g., `client.documents.get(id)`) but multiple items matched the criteria, or the identifier was not specific enough.","error":"documentcloud.exceptions.MultipleObjectsReturnedError: The API returned multiple objects when it expected one"},{"fix":"Double-check the ID or slug of the resource you are trying to access. Confirm that your DocumentCloud account has the necessary permissions to view or modify that specific resource.","cause":"Attempted to access a document, project, or other resource that either does not exist, or the authenticated user does not have permission to view.","error":"documentcloud.exceptions.DoesNotExistError"}]}