PyMuPDF Utilities for LLM/RAG

JSON →
library 1.27.2.2 ·python
verified Jun 28, 2026 install draft

PyMuPDF4LLM (also aliased as `pdf4llm`) is a Python library built on PyMuPDF, specialized in converting PDF documents into clean, structured data formats like Markdown, JSON, and plain text, specifically optimized for Large Language Model (LLM) and Retrieval-Augmented Generation (RAG) environments. It includes layout analysis, automatic OCR for scanned pages, and supports multi-column layouts and image extraction. The library is actively maintained and frequently updated, with the current stable version being 1.27.2.2.

total hits 19
actors 5 distinct systems
last hit 3d ago AhrefsBot
GPTBot
4
ChatGPT-User
3
Script
1
Humans
5

top countries 🇺🇸 United States · VN · 🇸🇬 Singapore · 🇪🇸 Spain · 🇫🇷 France