textract

JSON →
library 1.6.5 ·python
verified May 24, 2026

textract is a Python library designed to extract text from a wide variety of document formats, including PDFs, Word documents, images (via OCR), and audio files, providing a unified interface. The current stable version is 1.6.5, released in March 2022. While releases aren't on a strict schedule, the project is actively maintained with bug fixes and feature additions.

total hits 23
actors 7 distinct systems
last hit 1d ago AhrefsBot
GPTBot
6
ChatGPT-User
4
MetaBot
4
Script
3
Search engines
1
Humans
1

top countries 🇺🇸 United States · 🇩🇪 Germany · 🇨🇦 Canada · 🇮🇳 India · 🇫🇷 France