Azure AI Form Recognizer Client Library
The Azure AI Form Recognizer client library for Python, now part of Azure AI Document Intelligence, uses machine learning to analyze text and structured data from documents. It provides capabilities for layout extraction, prebuilt models (e.g., receipts, invoices, identity documents), custom model building and analysis, and document classification. This library is actively maintained as part of the broader Azure SDK for Python and typically sees regular updates with new service features and bug fixes.
Warnings
- breaking For API versions 2022-08-31 and later, you must use `DocumentAnalysisClient` and `DocumentModelAdministrationClient`. The older `FormRecognizerClient` is deprecated for these versions and only supports service API versions 2.1 and below.
- breaking As of SDK version 3.3.0 and service API version 2023-07-31 (now default), several properties and models were removed, including `query_fields` keyword argument, `DocumentPage.images`, `DocumentImage` model, `DocumentPage.annotations`, `DocumentAnnotation` model, and `DocumentKeyValuePair.common_name`. Some `AnalysisFeature` enum members were also renamed.
- deprecated The service name 'Azure Form Recognizer' was officially renamed to 'Azure AI Document Intelligence' in July 2023. While the Python package `azure-ai-formrecognizer` retains its name, documentation and service terminology refer to 'Document Intelligence'.
- breaking As of `azure-ai-formrecognizer` version 3.3.3, Python 3.7 is no longer supported. The minimum required Python version is 3.8.
- gotcha Common 'Unauthorized' errors (HTTP 401) often indicate an incorrect endpoint, an invalid or expired subscription key, or attempting Azure Active Directory (AAD) authentication on a regional endpoint (AAD requires a custom subdomain).
- gotcha Errors like 'InvalidContentSourceFormat', 'InvalidContent', or `DecodeError: JSON is invalid` for specific documents can stem from corrupted files, unsupported file types, incorrect SAS URLs, or issues with blob storage paths (e.g., an extra leading '/' in container paths can create a virtual directory that is not recognized).
Install
-
pip install azure-ai-formrecognizer
Imports
- DocumentAnalysisClient
from azure.ai.formrecognizer import DocumentAnalysisClient
- DocumentModelAdministrationClient
from azure.ai.formrecognizer import DocumentModelAdministrationClient
- AzureKeyCredential
from azure.core.credentials import AzureKeyCredential
Quickstart
import os
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential
# Set these environment variables for authentication
endpoint = os.environ.get('AZURE_FORM_RECOGNIZER_ENDPOINT', 'YOUR_FORM_RECOGNIZER_ENDPOINT')
key = os.environ.get('AZURE_FORM_RECOGNIZER_KEY', 'YOUR_FORM_RECOGNIZER_KEY')
if endpoint == 'YOUR_FORM_RECOGNIZER_ENDPOINT' or key == 'YOUR_FORM_RECOGNIZER_KEY':
print("Please set the AZURE_FORM_RECOGNIZER_ENDPOINT and AZURE_FORM_RECOGNIZER_KEY environment variables.")
exit()
# Example document URL (replace with your own)
document_url = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-quickstart-code/master/python/FormRecognizer/rest/sample_data/Form_1.jpg"
def analyze_general_document():
document_analysis_client = DocumentAnalysisClient(
endpoint=endpoint, credential=AzureKeyCredential(key)
)
print(f"Analyzing document from: {document_url}")
# Use the prebuilt-document model for general document analysis
poller = document_analysis_client.begin_analyze_document_from_url(
"prebuilt-document", document_url
)
result = poller.result()
if result.documents:
for idx, doc in enumerate(result.documents):
print(f"----Detected Document #{idx+1}-----")
print(f"Document type: {doc.doc_type}")
if doc.fields:
print("Fields:")
for name, field in doc.fields.items():
field_value = field.value if field.value else field.content
print(f" {name}: {field_value} (Confidence: {field.confidence:.2f})")
else:
print("No documents detected.")
print("---Analysis complete.---")
if __name__ == "__main__":
analyze_general_document()