Amazon Textract Caller
This library provides a simplified Python interface for making API calls to Amazon Textract, streamlining direct Textract interactions. As of its latest PyPI release (0.2.4), it primarily focuses on facilitating raw API requests and responses. However, active development has largely shifted to the `amazon-textract-textractor` library, which offers more comprehensive document parsing and utility features. The `amazon-textract-caller` package itself has not seen updates since January 2021.
Warnings
- breaking The `amazon-textract-caller` PyPI package (version 0.2.4) has not been updated since January 2021 and is effectively abandoned as a standalone package. Its functionality has been largely superseded and extended by the actively maintained `amazon-textract-textractor` library (v1.x), which resides in the same GitHub repository.
- gotcha This library is primarily a direct wrapper for the Textract API, returning raw Textract JSON responses. It does not provide the higher-level parsing, data extraction, and convenient object model utilities (like accessing forms or tables as Python objects) that are available in the `amazon-textract-textractor` library.
- gotcha This library relies on `boto3` for AWS API calls, which requires properly configured AWS credentials (e.g., via environment variables like `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_DEFAULT_REGION`, or AWS CLI configuration). Textract API calls incur costs based on usage.
Install
-
pip install amazon-textract-caller
Imports
- get_textract_response
from amazon_textract_caller import get_textract_response
- TextractFeatures
from amazon_textract_caller import TextractFeatures
Quickstart
import os
from amazon_textract_caller import get_textract_response, TextractFeatures
# Configure AWS credentials and region (e.g., via environment variables or AWS CLI config)
# os.environ['AWS_ACCESS_KEY_ID'] = os.environ.get('AWS_ACCESS_KEY_ID', '')
# os.environ['AWS_SECRET_ACCESS_KEY'] = os.environ.get('AWS_SECRET_ACCESS_KEY', '')
# os.environ['AWS_DEFAULT_REGION'] = os.environ.get('AWS_DEFAULT_REGION', 'us-east-1')
# Replace with your actual S3 document URI (e.g., "s3://your-bucket/your-document.pdf")
s3_document_uri = "s3://YOUR_BUCKET/YOUR_DOCUMENT.pdf"
try:
# Call Textract API with specified features
# This package returns the raw JSON response from Textract.
response = get_textract_response(
input_document=s3_document_uri,
features=[TextractFeatures.FORMS, TextractFeatures.TABLES]
)
print("Textract API call successful. Raw JSON response received:")
# print(response) # Uncomment to see the full raw Textract JSON response
print(f"Detected {len(response.get('Blocks', []))} blocks.")
print("\nNote: For higher-level parsing and object models, consider the `amazon-textract-textractor` library.")
except Exception as e:
print(f"Error during Textract API call: {e}")
print("Ensure valid AWS credentials, correct S3 URI, and appropriate permissions.")