Google Cloud Document AI Toolbox
raw JSON → 0.15.2 verified Fri May 01 auth: no python
Toolbox for Google Cloud Document AI: a Python library providing utilities to simplify working with Document AI processors, including splitting documents, converting between formats, and extracting entities. Current version 0.15.2, requires Python >=3.9. Maintained by Google, part of the google-cloud-python monorepo.
pip install google-cloud-documentai-toolbox Common errors
error ModuleNotFoundError: No module named 'documentai_toolbox' ↓
cause Toolbox not installed.
fix
pip install google-cloud-documentai-toolbox
error AttributeError: module 'google.cloud.documentai_toolbox' has no attribute 'Document' ↓
cause Trying to import Document from the wrong namespace.
fix
Use: from documentai_toolbox.wrappers.document import Document
Warnings
gotcha The toolbox does not inherit from google.cloud.documentai.Document; it wraps it. Always use `documentai_toolbox.wrappers.document.Document` to wrap a protobuf Document object. ↓
fix from documentai_toolbox.wrappers.document import Document; doc = Document(ai_document)
deprecated Python 3.9 support is deprecated in the google-cloud-python ecosystem. Future versions may drop 3.9 support. ↓
fix Upgrade to Python 3.10 or later.
gotcha The toolbox is not installed automatically with google-cloud-documentai. Must be installed separately. ↓
fix pip install google-cloud-documentai-toolbox
Imports
- Document wrong
from google.cloud.documentai_toolbox import Documentcorrectfrom documentai_toolbox.wrappers.document import Document
Quickstart
import os
from google.cloud import documentai
from documentai_toolbox.wrappers.document import Document
project_id = os.environ.get('GOOGLE_CLOUD_PROJECT', '')
location = 'us'
processor_id = os.environ.get('PROCESSOR_ID', '')
file_path = 'invoice.pdf'
# Initialize client
client = documentai.DocumentProcessorServiceClient()
name = client.processor_path(project_id, location, processor_id)
# Read file
with open(file_path, 'rb') as f:
content = f.read()
# Process document
request = documentai.ProcessRequest(name=name, raw_document={'content': content, 'mime_type': 'application/pdf'})
result = client.process_document(request=request)
# Wrap the Document AI Document object
doc = Document(result.document)
print(f"Entities: {doc.entities}")