pdfmux

stdio

Smart PDF-to-Markdown router that picks the best extractor per page, audits output quality, and re-extracts failures automatically. Confidence scoring, BYOK LLM support, RAG chunking.

Install

npx -y pdfmux-mcp

Tools · 10

pdfmux convert Extract PDF to Markdown, JSON, or chunks with per-page confidence scoring, auto-routing each page to the best backend.
pdfmux stream Stream pages as NDJSON as they finish, useful for long documents.
pdfmux watch Watch a directory for new PDFs and auto-convert them.
pdfmux estimate Predict cost before running extraction on a PDF.
pdfmux diff Diff two extractions side-by-side.
pdfmux doctor Pre-flight a directory to check which extras are needed for the batch.
batch_extract Batch extract PDFs, yielding (path, result) tuples as each completes.
extract_text Extract PDF to a Markdown string.
extract_json Extract PDF to a locked schema dict.
chunk Extract PDF into RAG-ready chunks with token limits.

Environment variables

GEMINI_API_KEY
ANTHROPIC_API_KEY

Links

githubgithub.com/NameetP/pdfmux ↗

★ 65 GitHub stars