Azure AI Translation Document Client Library for Python
The Azure AI Translation Document client library for Python is part of Microsoft's Azure SDK, providing functionality to integrate Document Translation capabilities into applications. It allows translation of whole documents across multiple languages and dialects while preserving the original structure and formatting. The library supports both asynchronous batch translation for multiple and complex files stored in Azure Blob Storage, and synchronous single-file translation. The current stable version is 1.1.0.
Warnings
- breaking The Document Translation service transitioned to date-based API versioning. Service behavior for v1.0 is now aligned with the `2024-05-01` API version. Using older SDK versions or implicitly relying on v1.0 behavior without specifying the API version may lead to unexpected results.
- breaking In version 2.0.0 (released 2024-11-15 for other languages, implied for Python SDK around similar time), the `document_translate` method of `SingleDocumentTranslationClient` was renamed to `translate`.
- gotcha Document Translation is only supported in *single-service* Translator resources, not multi-service Azure AI services resources. Also, it's not available in all Azure regions, and the Free (F0) tier does not support this feature.
- gotcha Common errors like `HttpResponseError: (InvalidDocumentAccessLevel)` or `Cannot access source document location with the current permissions` indicate issues with storage access. This usually means the SAS tokens are incorrect or expired, or the managed identity assigned to the Translator resource lacks the 'Storage Blob Data Contributor' role on the source/target storage accounts.
- gotcha When translating single files using SAS URLs, especially if experiencing issues, explicitly set the `storage_type` parameter to `StorageInputType.FILE` in `DocumentTranslationInput`. The service might not correctly infer the storage type in all scenarios.
- gotcha Users have reported issues with bullet points missing or being improperly formatted in translated PDF documents, indicating the service might not perfectly preserve all formatting elements for certain document types.
Install
-
pip install azure-ai-translation-document
Imports
- DocumentTranslationClient
from azure.ai.translation.document import DocumentTranslationClient
- SingleDocumentTranslationClient
from azure.ai.translation.document import SingleDocumentTranslationClient
- DocumentTranslationInput
from azure.ai.translation.document import DocumentTranslationInput
- TranslationTarget
from azure.ai.translation.document import TranslationTarget
- AzureKeyCredential
from azure.core.credentials import AzureKeyCredential
Quickstart
import os
from azure.core.credentials import AzureKeyCredential
from azure.ai.translation.document import DocumentTranslationClient, DocumentTranslationInput, TranslationTarget
# Set up environment variables for endpoint, key, and container URLs
endpoint = os.environ.get("AZURE_DOCUMENT_TRANSLATION_ENDPOINT", "https://YOUR_TRANSLATOR_RESOURCE_NAME.cognitiveservices.azure.com/")
key = os.environ.get("AZURE_DOCUMENT_TRANSLATION_KEY", "YOUR_API_KEY")
source_container_url = os.environ.get("AZURE_SOURCE_CONTAINER_URL", "https://YOUR_STORAGE_ACCOUNT.blob.core.windows.net/source?sas_token")
target_container_url = os.environ.get("AZURE_TARGET_CONTAINER_URL", "https://YOUR_STORAGE_ACCOUNT.blob.core.windows.net/target?sas_token")
target_language = "es"
# Ensure environment variables are set or provide placeholders
if not all([endpoint, key, source_container_url, target_container_url]):
print("Please set the environment variables: AZURE_DOCUMENT_TRANSLATION_ENDPOINT, AZURE_DOCUMENT_TRANSLATION_KEY, AZURE_SOURCE_CONTAINER_URL, AZURE_TARGET_CONTAINER_URL")
exit(1)
def begin_batch_translation():
client = DocumentTranslationClient(endpoint, AzureKeyCredential(key))
inputs = [
DocumentTranslationInput(
source_url=source_container_url,
targets=[
TranslationTarget(
target_url=target_container_url,
language_code=target_language
)
]
)
]
print("Submitting batch translation job...")
poller = client.begin_translation(inputs)
print(f"Job ID: {poller.id}")
print(f"Job status: {poller.status}")
# Wait for the job to complete
result = poller.result()
print("Translation job completed. Document statuses:")
for document_status in result:
print(f"Document ID: {document_status.id}")
print(f" Source document path: {document_status.source_document_path}")
print(f" Translated document path: {document_status.translated_document_path}")
print(f" Status: {document_status.status}")
if document_status.error:
print(f" Error: {document_status.error.code} - {document_status.error.message}")
if __name__ == '__main__':
begin_batch_translation()