pdfmux
JSON →Smart PDF-to-Markdown router that picks the best extractor per page, audits output quality, and re-extracts failures automatically. Confidence scoring, BYOK LLM support, RAG chunking.
Install
npx -y pdfmux-mcp Tools · 10
- pdfmux convert Extract PDF to Markdown, JSON, or chunks with per-page confidence scoring, auto-routing each page to the best backend.
- pdfmux stream Stream pages as NDJSON as they finish, useful for long documents.
- pdfmux watch Watch a directory for new PDFs and auto-convert them.
- pdfmux estimate Predict cost before running extraction on a PDF.
- pdfmux diff Diff two extractions side-by-side.
- pdfmux doctor Pre-flight a directory to check which extras are needed for the batch.
- batch_extract Batch extract PDFs, yielding (path, result) tuples as each completes.
- extract_text Extract PDF to a Markdown string.
- extract_json Extract PDF to a locked schema dict.
- chunk Extract PDF into RAG-ready chunks with token limits.
Environment variables
GEMINI_API_KEYANTHROPIC_API_KEY
Links
★ 65 GitHub stars