Google Cloud Document AI

3.14.0 · active · verified Sat Apr 11

Google Cloud Document AI (Document AI) is a service for parsing structured information from unstructured or semi-structured documents using state-of-the-art Google AI, including natural language processing, computer vision, translation, and AutoML. It helps automate tedious tasks, improve data extraction, and gain deeper insights from documents. The Python client library, currently at version 3.14.0, is part of the actively maintained `google-cloud-python` monorepo, receiving frequent updates.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to process a raw PDF document using a Document AI processor. It requires setting up authentication, a Google Cloud project, and an enabled Document AI processor. Ensure `GOOGLE_APPLICATION_CREDENTIALS` environment variable points to your service account key file, or that Application Default Credentials are configured.

import os
import base64
from google.cloud import documentai_v1 as documentai
from google.api_core.client_options import ClientOptions

project_id = os.environ.get('GCP_PROJECT_ID', 'your-project-id')
location = os.environ.get('GCP_REGION', 'us') # Format is 'us' or 'eu'
processor_id = os.environ.get('DOCUMENT_AI_PROCESSOR_ID', 'your-processor-id')
processor_version_id = os.environ.get('DOCUMENT_AI_PROCESSOR_VERSION_ID', 'rc') # Or specific version, e.g., 'pretrained-ocr-v1.0-2020-09-23'

# The full resource name of the processor version
# You can also use just 'projects/project_id/locations/location/processors/processor_id'
processor_name = f"projects/{project_id}/locations/{location}/processors/{processor_id}/processorVersions/{processor_version_id}"

# Local file path to the document
# For a real application, you'd load actual document bytes.
dummy_pdf_content = b"%PDF-1.4\n1 0 obj <</Type/Catalog/Pages 2 0 R>> endobj 2 0 obj <</Type/Pages/Count 1/Kids[3 0 R]>> endobj 3 0 obj <</Type/Page/MediaBox[0 0 612 792]/Contents 4 0 R/Parent 2 0 R>> endobj 4 0 obj <</Length 100>> stream\nBT /F1 24 Tf 100 700 Td (Hello Document AI!) Tj ET\nendstream\nendobj\nxref\n0 5\n0000000000 65535 f\n0000000009 00000 n\n0000000074 00000 n\n0000000155 00000 n\n0000000207 00000 n\ntrailer<</Size 5/Root 1 0 R>>\nstartxref\n313\n%%EOF"

mime_type = "application/pdf"

# Configure the client with regional endpoint
client_options = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")
client = documentai.DocumentProcessorServiceClient(client_options=client_options)

# Read the file into memory
raw_document = documentai.RawDocument(content=dummy_pdf_content, mime_type=mime_type)

# For 'process_document' api: process_options is available in v1beta3 and later
request = documentai.ProcessRequest(name=processor_name, raw_document=raw_document)

# You must enable the Document AI API in your Google Cloud project before running this code.
try:
    result = client.process_document(request=request)
    document = result.document
    print(f"Document processing complete. Text: {document.text}")
    if document.pages:
        print(f"Number of pages: {len(document.pages)}")
except Exception as e:
    print(f"Error processing document: {e}")
    print("Ensure GOOGLE_APPLICATION_CREDENTIALS environment variable is set or other auth method is configured.")
    print("Also, verify project_id, location, and processor_id are correct and the API is enabled.")

view raw JSON →