Google Cloud Document AI Toolbox

raw JSON →
0.15.2 verified Fri May 01 auth: no python

Toolbox for Google Cloud Document AI: a Python library providing utilities to simplify working with Document AI processors, including splitting documents, converting between formats, and extracting entities. Current version 0.15.2, requires Python >=3.9. Maintained by Google, part of the google-cloud-python monorepo.

pip install google-cloud-documentai-toolbox
error ModuleNotFoundError: No module named 'documentai_toolbox'
cause Toolbox not installed.
fix
pip install google-cloud-documentai-toolbox
error AttributeError: module 'google.cloud.documentai_toolbox' has no attribute 'Document'
cause Trying to import Document from the wrong namespace.
fix
Use: from documentai_toolbox.wrappers.document import Document
gotcha The toolbox does not inherit from google.cloud.documentai.Document; it wraps it. Always use `documentai_toolbox.wrappers.document.Document` to wrap a protobuf Document object.
fix from documentai_toolbox.wrappers.document import Document; doc = Document(ai_document)
deprecated Python 3.9 support is deprecated in the google-cloud-python ecosystem. Future versions may drop 3.9 support.
fix Upgrade to Python 3.10 or later.
gotcha The toolbox is not installed automatically with google-cloud-documentai. Must be installed separately.
fix pip install google-cloud-documentai-toolbox

Process a document with Document AI and wrap the result using the toolbox Document wrapper.

import os
from google.cloud import documentai
from documentai_toolbox.wrappers.document import Document

project_id = os.environ.get('GOOGLE_CLOUD_PROJECT', '')
location = 'us'
processor_id = os.environ.get('PROCESSOR_ID', '')
file_path = 'invoice.pdf'

# Initialize client
client = documentai.DocumentProcessorServiceClient()
name = client.processor_path(project_id, location, processor_id)

# Read file
with open(file_path, 'rb') as f:
    content = f.read()

# Process document
request = documentai.ProcessRequest(name=name, raw_document={'content': content, 'mime_type': 'application/pdf'})
result = client.process_document(request=request)

# Wrap the Document AI Document object
doc = Document(result.document)
print(f"Entities: {doc.entities}")