PyMuPDF Utilities for LLM/RAG
JSON → 1.27.2.2 verified Wed May 20 auth: no python install: draft
PyMuPDF4LLM (also aliased as `pdf4llm`) is a Python library built on PyMuPDF, specialized in converting PDF documents into clean, structured data formats like Markdown, JSON, and plain text, specifically optimized for Large Language Model (LLM) and Retrieval-Augmented Generation (RAG) environments. It includes layout analysis, automatic OCR for scanned pages, and supports multi-column layouts and image extraction. The library is actively maintained and frequently updated, with the current stable version being 1.27.2.2.
Traffic · last 30 days ↑0% vs prev 7d
When AI assistants answer questions about this library, they read this page. · indexed since Sun Apr 05
total hits 10
actors 5 distinct systems
last hit 4d ago human
top countries 🇮🇳 India · 🇺🇸 United States · 🇫🇷 France · 🇩🇪 Germany
API endpoints
full doc/v1/registry/pymupdf4llm
compatibility/v1/registry/pymupdf4llm/compatibility
quickstart/v1/registry/pymupdf4llm/quickstart