PyMuPDF Utilities for LLM/RAG
JSON →PyMuPDF4LLM (also aliased as `pdf4llm`) is a Python library built on PyMuPDF, specialized in converting PDF documents into clean, structured data formats like Markdown, JSON, and plain text, specifically optimized for Large Language Model (LLM) and Retrieval-Augmented Generation (RAG) environments. It includes layout analysis, automatic OCR for scanned pages, and supports multi-column layouts and image extraction. The library is actively maintained and frequently updated, with the current stable version being 1.27.2.2.
Traffic · last 30 days ↓83% vs prev 7d
total hits 19
actors 5 distinct systems
last hit 3d ago AhrefsBot
top countries 🇺🇸 United States · VN · 🇸🇬 Singapore · 🇪🇸 Spain · 🇫🇷 France
Resources
API endpoints
full doc /v1/registry/pymupdf4llm
install /v1/registry/pymupdf4llm/install
compatibility /v1/registry/pymupdf4llm/compatibility