Azure AI Document Intelligence
Microsoft Azure AI Document Intelligence Client Library for Python. This library provides access to Azure AI Document Intelligence (formerly Form Recognizer) services for processing documents and extracting data. It follows the Azure SDK guidelines for Python, offering features like layout analysis, prebuilt models for common document types, custom model building, and document classification.
Warnings
- breaking The package and client library were rebranded from `azure-ai-formrecognizer` to `azure-ai-documentintelligence`. This requires updating package imports and client class names (e.g., `FormRecognizerClient` to `DocumentIntelligenceClient`, `DocumentModelAdministrationClient` to `DocumentIntelligenceAdministrationClient`).
- breaking The structure of the `AnalyzeResult` object and how to access extracted data changed significantly in version 1.0.0 (aligned with service API 2024-11-30). Direct access to properties like `.forms` or `.receipts` is no longer available. Instead, results are accessed via `result.documents` which is a list of `Document` objects.
- gotcha Asynchronous (async) client operations require the `aiohttp` package to be installed separately (`pip install azure-ai-documentintelligence[aiohttp]`). Mixing synchronous and asynchronous clients or methods can lead to runtime errors or unexpected behavior.
- gotcha Authentication issues are common. Using a regional endpoint with Azure Active Directory (AAD) authentication is not supported; a custom subdomain name for your resource is required for AAD. Ensure `AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT` and `AZURE_DOCUMENT_INTELLIGENCE_KEY` environment variables are correctly set or provided to `DocumentIntelligenceClient` via `AzureKeyCredential`.
- deprecated Older API versions and specific models are being deprecated. For example, the `2022-08-31` API version and the `prebuilt-document` model are deprecated in favor of newer API versions (e.g., `2024-11-30`) and models like `prebuilt-layout` with `features=keyValuePairs`.
- gotcha When training and using custom models, remember to pass the `model_id` (an alphanumeric string or UUID), not the human-readable model name, to methods like `begin_analyze_document` or `begin_analyze_document_from_url`.
Install
-
pip install azure-ai-documentintelligence -
pip install azure-ai-documentintelligence[aiohttp]
Imports
- DocumentIntelligenceClient
from azure.ai.documentintelligence import DocumentIntelligenceClient
- DocumentIntelligenceAdministrationClient
from azure.ai.documentintelligence import DocumentIntelligenceAdministrationClient
- AzureKeyCredential
from azure.core.credentials import AzureKeyCredential
Quickstart
import os
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.core.credentials import AzureKeyCredential
# Set your Document Intelligence endpoint and key as environment variables
# e.g., AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT and AZURE_DOCUMENT_INTELLIGENCE_KEY
endpoint = os.environ.get("AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT", "<your-endpoint>")
key = os.environ.get("AZURE_DOCUMENT_INTELLIGENCE_KEY", "<your-key>")
if endpoint == "<your-endpoint>" or key == "<your-key>":
print("Please set the AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT and AZURE_DOCUMENT_INTELLIGENCE_KEY environment variables.")
print("You can find these in your Azure portal under your Document Intelligence resource's 'Keys and Endpoint' section.")
else:
document_url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/sample_forms/forms/Invoice_1.pdf"
document_intelligence_client = DocumentIntelligenceClient(
endpoint=endpoint, credential=AzureKeyCredential(key)
)
print(f"Analyzing document from URL: {document_url}")
# Use 'prebuilt-invoice' for invoices, 'prebuilt-receipt' for receipts, etc.
# Or use your custom model_id for custom models
poller = document_intelligence_client.begin_analyze_document_from_url(
"prebuilt-invoice", document_url
)
result = poller.result()
if result.documents:
for idx, document in enumerate(result.documents):
print(f"\n--- Document {idx + 1} Analysis ---")
if document.doc_type:
print(f" Document type: {document.doc_type}")
if document.fields:
print(" Extracted Fields:")
for name, field in document.fields.items():
if field.content:
print(f" {name}: {field.content} (Confidence: {field.confidence:.2f})")
else:
print("No documents found in the analysis result.")