textract

library 1.6.5 ·python

✓ verified May 24, 2026

textract is a Python library designed to extract text from a wide variety of document formats, including PDFs, Word documents, images (via OCR), and audio files, providing a unified interface. The current stable version is 1.6.5, released in March 2022. While releases aren't on a strict schedule, the project is actively maintained with bug fixes and feature additions.

Traffic · last 30 days ↓33% vs prev 7d · indexed Wed Apr 15 · updated Sun May 31

total hits 23

actors 7 distinct systems

last hit 1d ago AhrefsBot

GPTBot

6

ChatGPT-User

4

MetaBot

4

Script

3

Search engines

1

Humans

1

top countries 🇺🇸 United States · 🇩🇪 Germany · 🇨🇦 Canada · 🇮🇳 India · 🇫🇷 France

Resources

githubgithub.com/deanmalmgren/textract ↗

packagepypi.org/project/textract/ ↗

API endpoints

full doc /v1/registry/textract

install /v1/registry/textract/install

compatibility /v1/registry/textract/compatibility