OCRmyPDF

library 17.4.1 ·python

✓ verified May 22, 2026

OCRmyPDF is a Python library and application that adds an invisible OCR text layer to scanned PDF files, making them searchable. It utilizes the Tesseract OCR engine and other external tools to process documents, capable of producing highly optimized and archived-ready (PDF/A) files. The project is actively maintained with frequent updates, typically seeing major version releases annually and minor/patch releases more often.

Traffic · last 30 days ↓19% vs prev 7d · indexed Sun Apr 12 · updated Wed May 27

total hits 31

actors 10 distinct systems

last hit 20h ago ByteDance

ChatGPT-User

14

Script

3

ByteDance

2

OAI-SearchBot

2

ClaudeBot

1

Search engines

2

Humans

2

top countries 🇺🇸 United States · 🇰🇷 South Korea · 🇨🇦 Canada · 🇩🇪 Germany · 🇯🇵 Japan

Resources

docsocrmypdf.readthedocs.io/ ↗

githubgithub.com/ocrmypdf/OCRmyPDF ↗

changeloggithub.com/ocrmypdf/OCRmyPDF/tree/main/docs/releasenotes ↗

packagepypi.org/project/ocrmypdf/ ↗

API endpoints

full doc /v1/registry/ocrmypdf

install /v1/registry/ocrmypdf/install

compatibility /v1/registry/ocrmypdf/compatibility