Amazon Textract PrettyPrinter
raw JSON → 0.1.10 verified Fri May 01 auth: no python
A helper library for pretty printing Amazon Textract responses, providing a simple interface to convert Textract JSON output into formatted text, HTML, CSV, Markdown, and other formats. It is part of the amazon-textract-textractor suite. Current version: 0.1.10 (PyPI), but the repo is part of a larger project with versions up to 1.9.2. Release cadence is irregular; the PyPI package has not been updated since 2022. Requires Python >=3.6.
pip install amazon-textract-prettyprinter Common errors
error ModuleNotFoundError: No module named 'textractprettyprinter' ↓
cause Incorrect package name used in import (e.g., amazon_textract_prettyprinter)
fix
Install the package: pip install amazon-textract-prettyprinter, then use: from textractprettyprinter.t_pretty_print import Textract_PrettyPrint
error AttributeError: 'NoneType' object has no attribute 'get' ↓
cause The response object passed to PrettyPrint methods is None or invalid. Usually happens when the Textract API call fails or returns an error.
fix
Check that call_textract or boto3 Textract call returns a valid response with 'Blocks' key. Add error handling for Textract API failures.
error ImportError: cannot import name 'Textract_PrettyPrint' from 'textractprettyprinter' ↓
cause The user imported directly from textractprettyprinter instead of from textractprettyprinter.t_pretty_print
fix
Use: from textractprettyprinter.t_pretty_print import Textract_PrettyPrint
Warnings
breaking The PyPI package version (0.1.10) is outdated and does not match the GitHub project versioning. Some features may be missing or broken. Use the package from the main repository (amazon-textract-textractor) for latest features. ↓
fix Install from the main repository: pip install amazon-textract-textractor (which includes prettyprinter).
deprecated The import path uses 'textractprettyprinter' (all lowercase, no hyphens) which is unusual. Many users mistake it for 'amazon_textract_prettyprinter'. ↓
fix Use correct import: from textractprettyprinter.t_pretty_print import Textract_PrettyPrint
gotcha The library assumes you have already called Textract and parsed the response. It does not call the Textract API itself. You must use amazon-textract-caller (or boto3 directly) to get the response object. ↓
fix Use call_textract from textractcaller to get the response, then pass to PrettyPrint methods.
gotcha CSV, HTML, and Markdown output requires optional dependencies. If you try to use them without installing extras, you'll get ModuleNotFoundError. ↓
fix Install with extras: pip install amazon-textract-prettyprinter[html,csv,markdown]
Install
pip install amazon-textract-prettyprinter[html,csv] Imports
- Textract_PrettyPrint wrong
from amazon_textract_prettyprinter import Textract_PrettyPrintcorrectfrom textractprettyprinter.t_pretty_print import Textract_PrettyPrint
Quickstart
import boto3
from textractcaller.t_call import call_textract
from textractprettyprinter.t_pretty_print import Textract_PrettyPrint
# Call Textract API (make sure AWS credentials are configured)
client = boto3.client('textract', region_name='us-east-1')
response = call_textract(input_document="s3://bucket/document.pdf", client=client)
# Pretty print as text
pretty_printer = Textract_PrettyPrint()
text_output = pretty_printer.print_text(response)
print(text_output)
# Pretty print as CSV (columns: Key, Value)
csv_output = pretty_printer.print_csv(response)
print(csv_output)