PyMuPDF Utilities for LLM/RAG

JSON →
1.27.2.2 verified Wed May 20 auth: no python install: draft

PyMuPDF4LLM (also aliased as `pdf4llm`) is a Python library built on PyMuPDF, specialized in converting PDF documents into clean, structured data formats like Markdown, JSON, and plain text, specifically optimized for Large Language Model (LLM) and Retrieval-Augmented Generation (RAG) environments. It includes layout analysis, automatic OCR for scanned pages, and supports multi-column layouts and image extraction. The library is actively maintained and frequently updated, with the current stable version being 1.27.2.2.

When AI assistants answer questions about this library, they read this page. · indexed since Sun Apr 05

total hits 10
actors 5 distinct systems
last hit 4d ago human
ChatGPT-User
1
Search engines
2
Humans
1

top countries 🇮🇳 India · 🇺🇸 United States · 🇫🇷 France · 🇩🇪 Germany