Amazon Textract Caller
raw JSON → 0.2.4 verified Mon Apr 13 auth: no python abandoned
This library provides a simplified Python interface for making API calls to Amazon Textract, streamlining direct Textract interactions. As of its latest PyPI release (0.2.4), it primarily focuses on facilitating raw API requests and responses. However, active development has largely shifted to the `amazon-textract-textractor` library, which offers more comprehensive document parsing and utility features. The `amazon-textract-caller` package itself has not seen updates since January 2021.
pip install amazon-textract-caller Common errors
error ModuleNotFoundError: No module named 'textract' ↓
cause The 'textract' module is not installed in the Python environment.
fix
Install the module using pip: 'pip install textract'.
error ModuleNotFoundError: No module named 'amazon-textract-caller' ↓
cause The 'amazon-textract-caller' package is not installed in the Python environment.
fix
Install the package using pip: 'pip install amazon-textract-caller'.
error ImportError: cannot import name 'call_textract' from 'textractcaller' ↓
cause The 'call_textract' function is not available in the 'textractcaller' module, possibly due to an outdated version.
fix
Ensure you have the latest version of 'amazon-textract-caller' installed: 'pip install --upgrade amazon-textract-caller'.
error AttributeError: module 'textractcaller' has no attribute 'call_textract' ↓
cause The 'call_textract' function is not defined in the 'textractcaller' module, possibly due to an incorrect import or outdated package.
fix
Verify the correct import statement and update the package: 'pip install --upgrade amazon-textract-caller'.
error TypeError: call_textract() got an unexpected keyword argument 'force_async_api' ↓
cause The 'call_textract' function does not accept the 'force_async_api' argument, likely due to a version mismatch.
fix
Check the function's documentation for the correct parameters and update the package if necessary: 'pip install --upgrade amazon-textract-caller'.
Warnings
breaking The `amazon-textract-caller` PyPI package (version 0.2.4) has not been updated since January 2021 and is effectively abandoned as a standalone package. Its functionality has been largely superseded and extended by the actively maintained `amazon-textract-textractor` library (v1.x), which resides in the same GitHub repository. ↓
fix For new projects, or if needing current features, bug fixes, or higher-level parsing, it is strongly recommended to migrate to the `amazon-textract-textractor` library. This package might not be compatible with newer Textract API features or `boto3` versions.
gotcha This library is primarily a direct wrapper for the Textract API, returning raw Textract JSON responses. It does not provide the higher-level parsing, data extraction, and convenient object model utilities (like accessing forms or tables as Python objects) that are available in the `amazon-textract-textractor` library. ↓
fix If advanced document parsing, structured data extraction (e.g., easy access to key-value pairs, tables, or entity recognition), and an object-oriented representation of the document are required, consider `amazon-textract-textractor` for a more feature-rich experience.
gotcha This library relies on `boto3` for AWS API calls, which requires properly configured AWS credentials (e.g., via environment variables like `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_DEFAULT_REGION`, or AWS CLI configuration). Textract API calls incur costs based on usage. ↓
fix Ensure your AWS environment is correctly configured with valid credentials and permissions for Textract. Regularly monitor your AWS Textract usage and associated costs in the AWS console.
Imports
- get_textract_response
from amazon_textract_caller import get_textract_response - TextractFeatures
from amazon_textract_caller import TextractFeatures
Quickstart
import os
from amazon_textract_caller import get_textract_response, TextractFeatures
# Configure AWS credentials and region (e.g., via environment variables or AWS CLI config)
# os.environ['AWS_ACCESS_KEY_ID'] = os.environ.get('AWS_ACCESS_KEY_ID', '')
# os.environ['AWS_SECRET_ACCESS_KEY'] = os.environ.get('AWS_SECRET_ACCESS_KEY', '')
# os.environ['AWS_DEFAULT_REGION'] = os.environ.get('AWS_DEFAULT_REGION', 'us-east-1')
# Replace with your actual S3 document URI (e.g., "s3://your-bucket/your-document.pdf")
s3_document_uri = "s3://YOUR_BUCKET/YOUR_DOCUMENT.pdf"
try:
# Call Textract API with specified features
# This package returns the raw JSON response from Textract.
response = get_textract_response(
input_document=s3_document_uri,
features=[TextractFeatures.FORMS, TextractFeatures.TABLES]
)
print("Textract API call successful. Raw JSON response received:")
# print(response) # Uncomment to see the full raw Textract JSON response
print(f"Detected {len(response.get('Blocks', []))} blocks.")
print("\nNote: For higher-level parsing and object models, consider the `amazon-textract-textractor` library.")
except Exception as e:
print(f"Error during Textract API call: {e}")
print("Ensure valid AWS credentials, correct S3 URI, and appropriate permissions.")