textract
JSON →textract is a Python library designed to extract text from a wide variety of document formats, including PDFs, Word documents, images (via OCR), and audio files, providing a unified interface. The current stable version is 1.6.5, released in March 2022. While releases aren't on a strict schedule, the project is actively maintained with bug fixes and feature additions.
Traffic · last 30 days ↓33% vs prev 7d
total hits 23
actors 7 distinct systems
last hit 1d ago AhrefsBot
top countries 🇺🇸 United States · 🇩🇪 Germany · 🇨🇦 Canada · 🇮🇳 India · 🇫🇷 France
API endpoints
full doc /v1/registry/textract
install /v1/registry/textract/install
compatibility /v1/registry/textract/compatibility