PyMuPDF Utilities for LLM/RAG

1.27.2.2 verified Wed May 20 auth: no python install: draft

PyMuPDF4LLM (also aliased as `pdf4llm`) is a Python library built on PyMuPDF, specialized in converting PDF documents into clean, structured data formats like Markdown, JSON, and plain text, specifically optimized for Large Language Model (LLM) and Retrieval-Augmented Generation (RAG) environments. It includes layout analysis, automatic OCR for scanned pages, and supports multi-column layouts and image extraction. The library is actively maintained and frequently updated, with the current stable version being 1.27.2.2.

Traffic · last 30 days ↑0% vs prev 7d

When AI assistants answer questions about this library, they read this page. · indexed since Sun Apr 05

total hits 10

actors 5 distinct systems

last hit 4d ago human

ChatGPT-User

Search engines

Humans

top countries 🇮🇳 India · 🇺🇸 United States · 🇫🇷 France · 🇩🇪 Germany

API endpoints

full doc/v1/registry/pymupdf4llm

install/v1/registry/pymupdf4llm/install

imports/v1/registry/pymupdf4llm/imports

compatibility/v1/registry/pymupdf4llm/compatibility

quickstart/v1/registry/pymupdf4llm/quickstart