Azure AI Form Recognizer Client Library

3.3.3 · active · verified Sat Apr 11

The Azure AI Form Recognizer client library for Python, now part of Azure AI Document Intelligence, uses machine learning to analyze text and structured data from documents. It provides capabilities for layout extraction, prebuilt models (e.g., receipts, invoices, identity documents), custom model building and analysis, and document classification. This library is actively maintained as part of the broader Azure SDK for Python and typically sees regular updates with new service features and bug fixes.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a DocumentAnalysisClient, authenticate using an API key, and analyze a document from a URL using the 'prebuilt-document' model to extract general key-value pairs and other structural information. Remember to replace placeholder endpoint and key values with your actual Azure Form Recognizer resource credentials and provide a URL to your document.

import os
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential

# Set these environment variables for authentication
endpoint = os.environ.get('AZURE_FORM_RECOGNIZER_ENDPOINT', 'YOUR_FORM_RECOGNIZER_ENDPOINT')
key = os.environ.get('AZURE_FORM_RECOGNIZER_KEY', 'YOUR_FORM_RECOGNIZER_KEY')

if endpoint == 'YOUR_FORM_RECOGNIZER_ENDPOINT' or key == 'YOUR_FORM_RECOGNIZER_KEY':
    print("Please set the AZURE_FORM_RECOGNIZER_ENDPOINT and AZURE_FORM_RECOGNIZER_KEY environment variables.")
    exit()

# Example document URL (replace with your own)
document_url = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-quickstart-code/master/python/FormRecognizer/rest/sample_data/Form_1.jpg"

def analyze_general_document():
    document_analysis_client = DocumentAnalysisClient(
        endpoint=endpoint, credential=AzureKeyCredential(key)
    )

    print(f"Analyzing document from: {document_url}")
    # Use the prebuilt-document model for general document analysis
    poller = document_analysis_client.begin_analyze_document_from_url(
        "prebuilt-document", document_url
    )
    result = poller.result()

    if result.documents:
        for idx, doc in enumerate(result.documents):
            print(f"----Detected Document #{idx+1}-----")
            print(f"Document type: {doc.doc_type}")
            if doc.fields:
                print("Fields:")
                for name, field in doc.fields.items():
                    field_value = field.value if field.value else field.content
                    print(f"  {name}: {field_value} (Confidence: {field.confidence:.2f})")
    else:
        print("No documents detected.")

    print("---Analysis complete.---")

if __name__ == "__main__":
    analyze_general_document()

view raw JSON →