PyMuPDF Layout
raw JSON → 1.27.2.2 verified Tue May 12 auth: no python install: stale
PyMuPDF Layout is a fast and lightweight Python package that integrates with PyMuPDF to provide AI-driven layout analysis for PDFs. It converts PDFs into structured data (Markdown, JSON, or plain text) by training Graph Neural Networks on PDF internals, offering a 10x speed improvement over vision-based tools without requiring a GPU. It is currently at version 1.27.2.2 and receives frequent updates, often alongside its companion library, PyMuPDF4LLM.
pip install pymupdf-layout pymupdf4llm Common errors
error ModuleNotFoundError: No module named 'pymupdf_layout' ↓
cause The `pymupdf-layout` library is intended to be imported as a submodule of `pymupdf` (i.e., `pymupdf.layout`), not as a top-level module named `pymupdf_layout`. Additionally, it needs to be imported before `pymupdf4llm` to activate its layout features.
fix
Use
import pymupdf.layout in your Python script, ensuring it's imported before import pymupdf4llm if you are using PyMuPDF4LLM for extraction. error AttributeError: 'Page' object has no attribute 'extract_layout' ↓
cause The `pymupdf-layout` features for structured data extraction (like Markdown, JSON, or plain text) are typically accessed through functions in the companion library `pymupdf4llm`, such as `to_markdown()`, `to_json()`, or `to_text()`, after `pymupdf.layout` has been properly imported to enable the underlying layout analysis.
fix
First, import
pymupdf.layout and pymupdf4llm. Then, use methods from pymupdf4llm on a Document object obtained via pymupdf.open(), for example: md = pymupdf4llm.to_markdown(doc). error ERROR: Failed building wheel for pymupdf ↓
cause This error occurs during installation when `pip` cannot find a pre-compiled binary wheel for PyMuPDF (which `pymupdf-layout` depends on) for your system, and attempts to build it from source. Building from source requires C/C++ development tools (like Visual Studio on Windows or build-essential on Linux) which are often not present by default.
fix
Ensure your
pip is up to date (python -m pip install --upgrade pip). If the error persists, install the necessary C/C++ build tools for your operating system: for Windows, install Visual Studio 2019 (Community edition is sufficient); for Linux, install build-essential (e.g., sudo apt-get install build-essential). Warnings
gotcha The `pymupdf.layout` module *must* be imported before `pymupdf4llm` to ensure that PyMuPDF's layout analysis features are activated. If the order is incorrect, `pymupdf4llm` will run without layout enhancement. ↓
fix Ensure `import pymupdf.layout` appears before `import pymupdf4llm` in your code.
gotcha The `header=False` and `footer=False` parameters for omitting headers and footers are not applicable when extracting data using `pymupdf4llm.to_json()`. The JSON output is designed to be a comprehensive representation of all page data. ↓
fix If header/footer exclusion is needed, process JSON output manually or use `to_markdown()` or `to_text()` with the respective parameters.
breaking Prior to `pymupdf4llm` version 1.27, `pymupdf-layout` had to be explicitly installed and imported. Since `pymupdf4llm` v1.27, `pymupdf-layout` is automatically installed and used, simplifying the setup but changing the dependency structure. ↓
fix For older `pymupdf4llm` versions, explicitly `pip install pymupdf-layout` and `import pymupdf.layout`. For v1.27+, `pip install pymupdf4llm` is often sufficient, but explicitly importing `pymupdf.layout` is still good practice to ensure activation.
gotcha PyMuPDF Layout is licensed under PolyForm Noncommercial, which restricts commercial use. Review the license terms carefully for your specific application. ↓
fix Consult the PolyForm Noncommercial license for details. For commercial use, contact Artifex Software for alternative licensing options.
gotcha For advanced document types (e.g., Office documents like DOCX, XLSX, PPTX), `PyMuPDF Pro` is required in addition to `PyMuPDF4LLM` to enable processing. PyMuPDF Layout itself primarily enhances PDF processing. ↓
fix If processing non-PDF document formats, ensure you have the appropriate `PyMuPDF Pro` license and package installed alongside `pymupdf4llm`.
Install
pip install pymupdf4llm Install compatibility stale last tested: 2026-05-12 v1.27.2.3 (up to date)
python os / libc variant status wheel install import disk mem side effects
3.10 alpine (musl) pymupdf-layout build_error - - - - - -
3.10 alpine (musl) pymupdf-layout - - - - - -
3.10 alpine (musl) pymupdf4llm wheel - - 175.9M - broken
3.10 alpine (musl) pymupdf4llm - - - - - -
3.10 slim (glibc) pymupdf-layout wheel 12.2s 1.02s 298M 34.4M clean
3.10 slim (glibc) pymupdf-layout - - 1.09s 298M 34.4M -
3.10 slim (glibc) pymupdf4llm wheel 11.7s 0.98s 298M 34.4M clean
3.10 slim (glibc) pymupdf4llm - - 1.06s 298M 34.4M -
3.11 alpine (musl) pymupdf-layout build_error - - - - - -
3.11 alpine (musl) pymupdf-layout - - - - - -
3.11 alpine (musl) pymupdf4llm wheel - - 159.8M - broken
3.11 alpine (musl) pymupdf4llm - - - - - -
3.11 slim (glibc) pymupdf-layout wheel 7.4s 3.58s 255M 39.1M clean
3.11 slim (glibc) pymupdf-layout - - 3.94s 254M 39.1M -
3.11 slim (glibc) pymupdf4llm wheel 7.2s 3.64s 255M 39.1M clean
3.11 slim (glibc) pymupdf4llm - - 3.90s 254M 39.1M -
3.12 alpine (musl) pymupdf-layout build_error - - - - - -
3.12 alpine (musl) pymupdf-layout - - - - - -
3.12 alpine (musl) pymupdf4llm wheel - - 150.2M - broken
3.12 alpine (musl) pymupdf4llm - - - - - -
3.12 slim (glibc) pymupdf-layout wheel 7.3s 2.65s 242M 36.5M clean
3.12 slim (glibc) pymupdf-layout - - 2.96s 241M 36.5M -
3.12 slim (glibc) pymupdf4llm wheel 7.2s 2.72s 242M 36.5M clean
3.12 slim (glibc) pymupdf4llm - - 3.04s 241M 36.5M -
3.13 alpine (musl) pymupdf-layout build_error - - - - - -
3.13 alpine (musl) pymupdf-layout - - - - - -
3.13 alpine (musl) pymupdf4llm wheel - - 146.7M - broken
3.13 alpine (musl) pymupdf4llm - - - - - -
3.13 slim (glibc) pymupdf-layout wheel 7.6s 2.51s 241M 36.5M clean
3.13 slim (glibc) pymupdf-layout - - 2.92s 240M 36.5M -
3.13 slim (glibc) pymupdf4llm wheel 7.3s 2.38s 241M 36.5M clean
3.13 slim (glibc) pymupdf4llm - - 2.89s 240M 36.5M -
3.9 alpine (musl) pymupdf-layout build_error - - - - - -
3.9 alpine (musl) pymupdf-layout - - - - - -
3.9 alpine (musl) pymupdf4llm wheel - - 147.6M - broken
3.9 alpine (musl) pymupdf4llm - - - - - -
3.9 slim (glibc) pymupdf-layout build_error - 1.7s - - - -
3.9 slim (glibc) pymupdf-layout - - - - - -
3.9 slim (glibc) pymupdf4llm wheel 3.3s - 218M - broken
3.9 slim (glibc) pymupdf4llm - - - - - -
Imports
- layout
import pymupdf.layout - pymupdf4llm
import pymupdf4llm - pymupdf
import pymupdf
Quickstart last tested: 2026-04-24
import pymupdf # For document opening
import pymupdf.layout # Crucial: import layout before pymupdf4llm
import pymupdf4llm # For structured data extraction
import os
# Create a dummy PDF file for demonstration
doc = pymupdf.open()
p = doc.new_page()
p.insert_text((50, 50), "Header text\n", fontname="helv", fontsize=12)
p.insert_text((50, 80), "This is a sample paragraph. It demonstrates basic text extraction.")
p.insert_text((50, 110), "Another paragraph with some more content to showcase layout analysis.")
doc.save("sample_document.pdf")
doc.close()
# Open the document with PyMuPDF
document = pymupdf.open("sample_document.pdf")
# Extract structured data as Markdown
markdown_output = pymupdf4llm.to_markdown(document)
print("\n--- Markdown Output ---")
print(markdown_output)
# Extract structured data as JSON (note: header/footer filtering not applicable to JSON)
json_output = pymupdf4llm.to_json(document)
print("\n--- JSON Output ---")
# For brevity, print only a part of the JSON structure if it's large
import json
print(json.dumps(json_output, indent=2))
# Remove the dummy file
os.remove("sample_document.pdf")