PyMuPDF

1.27.2.2 verified Tue May 12 auth: no python install: draft quickstart: draft

PyMuPDF is a Python binding for MuPDF, a lightweight PDF, XPS, and E-book viewer, renderer, and toolkit. It provides comprehensive functionalities for handling PDF documents, including reading, writing, rendering pages to images, extracting text, searching, annotating, and manipulating document structure. The library is actively maintained with frequent releases, often tied to updates of the underlying MuPDF library, currently at version 1.27.2.2.

pip install --upgrade pymupdf

Common errors

error AttributeError: module 'fitz' has no attribute 'open' ↓

cause This error often occurs when an outdated or conflicting `fitz` package is installed alongside `PyMuPDF`, or when using `import fitz` while `pymupdf` is the intended library. `PyMuPDF` uses `fitz` as a legacy alias, but conflicts can arise if a separate `fitz` package exists.

fix

First, uninstall any potentially conflicting fitz package and PyMuPDF, then reinstall PyMuPDF. The recommended way to import is import pymupdf. If you need to maintain fitz in older code, use import pymupdf as fitz. Ensure your Python environment (including virtual environments) is clean.

error ModuleNotFoundError: No module named 'pymupdf' ↓

cause This typically means the PyMuPDF library was not installed correctly, or the Python interpreter running the code is not the one where PyMuPDF was installed. It can also occur if an older `import fitz` is used after installing only `pymupdf`.

fix

Ensure pymupdf is installed using pip install pymupdf. Verify that your IDE or terminal is using the correct Python interpreter. If you previously relied on import fitz, update your code to import pymupdf or import pymupdf as fitz for backward compatibility.

error AttributeError: 'Document' object has no attribute 'loadPage' ↓

cause PyMuPDF transitioned from camelCase to snake_case for many of its method names (e.g., `loadPage` became `load_page`). This error indicates you are using an older method name with a newer version of the library.

fix

Update your code to use the snake_case version of the method. For loadPage, change it to load_page. Similarly, new_page is now new_page(). Consult the PyMuPDF documentation for the correct method names for your installed version.

error ERROR: Failed building wheel for PyMuPDF ↓

cause This error occurs during installation when `pip` attempts to build `PyMuPDF` from source because a pre-compiled wheel is not available for your specific operating system, Python version, or architecture. This often requires C/C++ development tools to be installed, or there might be conflicts with other package requirements (e.g., `paddleocr` requiring an older `PyMuPDF` version).

fix

Ensure you have the necessary build tools (e.g., Visual Studio on Windows, Xcode command line tools on macOS, build-essential on Linux). For specific version conflicts with other libraries, try installing a compatible PyMuPDF version, e.g., pip install PyMuPDF==X.Y.Z, or downgrade Python to a version for which wheels are available.

Warnings

breaking Supported Python versions have changed between minor releases. For instance, version 1.26.5 supported Python 3.9-3.14, while 1.26.6 narrowed this to 3.10-3.14. Always check the release notes for the exact supported Python versions before upgrading, especially in automated environments. ↓

fix Consult the release notes for your target PyMuPDF version and ensure your Python environment matches the supported range. Upgrade or downgrade Python if necessary.

breaking The `pymupdf embed-extract` command's safety has been improved. It now refuses to write to an existing file or outside the current directory by default, preventing accidental overwrites or unauthorized file creation. ↓

fix If you relied on previous behavior, you must explicitly handle file output, likely by ensuring the target file does not exist or by operating within the designated current directory. Check the documentation for new options to override this safety measure if intended.

breaking The behavior of `get_textpage_ocr()` changed to OCR *all* page areas outside legible text, not just previously limited ones. This can lead to different or more extensive OCR results than in prior versions. ↓

fix Review any code that relies on the output of `get_textpage_ocr()` and adjust expectations or post-processing logic to account for potentially more comprehensive OCR data.

gotcha Forgetting to close document objects (`doc.close()`) can lead to resource leaks (e.g., open file handles) or temporary files not being cleaned up, especially when working with many documents or in long-running processes. ↓

fix Always call `doc.close()` when you are finished with a document. Alternatively, use a `with` statement: `with fitz.open('file.pdf') as doc: ...` to ensure the document is automatically closed.

gotcha PyMuPDF uses a coordinate system where the origin (0,0) is at the top-left corner of the page. Y-coordinates increase downwards, and X-coordinates increase to the right. This can be counter-intuitive for users familiar with bottom-left origin systems. ↓

fix Always remember that `Point(x, y)` and `Rect(x0, y0, x1, y1)` define positions relative to the top-left corner, with `y` increasing as you move down the page.

gotcha The test output 'PDF file not found. Creating a dummy PDF.' suggests that a required input PDF file was not present in the test environment, causing the test script to create a placeholder. This is an environmental or test setup issue, not a direct error or breaking change within the PyMuPDF library functionality. ↓

fix Ensure that all expected input files for the test script are correctly placed and accessible in the test environment. Review the test setup documentation or script logic for expected file paths.

breaking PyMuPDF, being a C/C++ wrapper, requires certain system-level C/C++ runtime libraries (e.g., `libstdc++.so.6`). In minimal environments, such as Alpine Linux, these libraries may not be present by default, leading to `ImportError` during module loading. ↓

fix Ensure that your environment includes the necessary C/C++ runtime libraries. For Alpine Linux, this typically means installing `g++` or `libstdc++`. Add `RUN apk add g++` (or `apk add build-base` which includes `g++`) to your Dockerfile or installation script before installing PyMuPDF.

Install compatibility draft last tested: 2026-05-12

python os / libc status wheel install import disk

3.10 alpine (musl) wheel - - 78.5M

3.10 alpine (musl) - - - -

3.10 slim (glibc) wheel 3.1s 0.31s 78M

3.10 slim (glibc) - - 0.47s 78M

3.11 alpine (musl) wheel - - 81.9M

3.11 alpine (musl) - - - -

3.11 slim (glibc) wheel 2.7s 2.57s 81M

3.11 slim (glibc) - - 3.05s 81M

3.12 alpine (musl) wheel - - 73.6M

3.12 alpine (musl) - - - -

3.12 slim (glibc) wheel 2.6s 1.63s 73M

3.12 slim (glibc) - - 1.92s 73M

3.13 alpine (musl) wheel - - 73.3M

3.13 alpine (musl) - - - -

3.13 slim (glibc) wheel 2.7s 1.57s 73M

3.13 slim (glibc) - - 1.80s 73M

3.9 alpine (musl) wheel - - 76.4M

3.9 alpine (musl) - - - -

3.9 slim (glibc) wheel 3.2s 0.25s 76M

3.9 slim (glibc) - - 0.24s 76M

Imports

fitz
wrong
```
import pymupdf; doc = pymupdf.open('file.pdf')
```
correct
```
import fitz
```
`fitz` is the conventional and widely used alias for PyMuPDF, inherited from its original name, 'PyFITS'. While `import pymupdf` also works, most examples and community resources use `fitz`.

Quickstart draft last tested: 2026-04-24

This quickstart demonstrates how to open a PDF document (or create a dummy one if not found), extract text from its first page, and then properly close the document. Replace 'input.pdf' with the path to your actual PDF file.

import fitz # PyMuPDF

# Open a document
try:
    doc = fitz.open("input.pdf") # Replace with your PDF file
except fitz.FileNotFoundError:
    print("PDF file not found. Creating a dummy PDF.")
    doc = fitz.open() # Create a new, empty PDF
    doc.new_page()
    page = doc[0]
    page.insert_text(fitz.Point(50, 50), "Hello, PyMuPDF!")
    doc.save("input.pdf")
    doc.close()
    doc = fitz.open("input.pdf")

# Get the first page
page = doc[0]

# Extract text
text = page.get_text()
print(f"Extracted text:\n{text}")

# Close the document
doc.close()