PyMuPDF-pro
PyMuPDF-pro is a commercial extension for the open-source PyMuPDF library. It enables robust handling of Office documents (e.g., .doc, .docx, .ppt, .pptx, .xls, .xlsx) and other formats not natively supported by PyMuPDF. It facilitates text and table extraction, document conversion to PDF, and more. The current version is 1.27.2.2, with releases typically aligning with PyMuPDF updates and feature enhancements.
Common errors
-
AttributeError: module 'fitz.utils' has no attribute 'set_license' or AttributeError: module 'fitz' has no attribute 'open_office_document'
cause PyMuPDF-pro might not be correctly installed, or `PyMuPDF` (which provides the `fitz` module) is missing, preventing PyMuPDF-pro from patching the `fitz` module.fixEnsure both PyMuPDF and PyMuPDF-pro are installed: `pip install PyMuPDF PyMuPDF-pro`. Verify that `import fitz` is executed before attempting to use PyMuPDF-pro specific functions. -
RuntimeError: License key expired or not valid
cause The license key provided to `fitz.utils.set_license()` is either incorrect, expired, or has not been properly registered with Artifex Software.fixVerify your PyMuPDF-pro license key. Contact Artifex Software support if you suspect the key is invalid or expired. Ensure the key is correctly passed to `fitz.utils.set_license()`. -
fitz.EmptyFilename: cannot open path/to/nonexistent.docx
cause The file path provided to `fitz.open_office_document()` does not point to an existing file, or the path is inaccessible/incorrect.fixDouble-check the file path. Ensure the file exists at the specified location and that your application has read permissions for it. Use absolute paths to avoid ambiguity.
Warnings
- breaking PyMuPDF-pro requires a valid commercial license key to unlock its functionality. Without a key set via `fitz.utils.set_license()`, most operations on Office documents will fail with a license error.
- gotcha PyMuPDF-pro extends the `fitz` module of PyMuPDF. Therefore, PyMuPDF must be installed alongside PyMuPDF-pro for the extensions to be active and for functions like `fitz.open_office_document` to be available.
- gotcha PyMuPDF-pro supports Python versions 3.9 and higher. Using it with older Python versions will result in installation or runtime errors.
- gotcha Processing large or complex Office documents (especially conversion or detailed extraction) can be resource-intensive, requiring significant CPU and memory. Performance may vary based on document complexity and system resources.
Install
-
pip install PyMuPDF-pro
Imports
- fitz
import fitz
- fitz.utils.set_license
import fitz
- fitz.open_office_document
import fitz
Quickstart
import os
import fitz # PyMuPDF-pro extends PyMuPDF's 'fitz' module
from pathlib import Path
# --- IMPORTANT: License Key Setup ---
# PyMuPDF-pro requires a commercial license key.
# Obtain your key from Artifex Software and set it as an environment variable,
# or replace 'YOUR_ACTUAL_LICENSE_KEY_HERE'.
license_key = os.environ.get('PYMUPDFPRO_LICENSE', 'YOUR_ACTUAL_LICENSE_KEY_HERE')
if license_key == 'YOUR_ACTUAL_LICENSE_KEY_HERE':
print("WARNING: Please set the 'PYMUPDFPRO_LICENSE' environment variable or replace the placeholder.")
print("Without a valid license, PyMuPDF-pro features will not function correctly.")
else:
try:
fitz.utils.set_license(license_key)
print("PyMuPDF-pro license key setup attempted.")
except AttributeError:
print("ERROR: 'fitz.utils.set_license' not found. Is PyMuPDF-pro installed?")
exit(1)
except Exception as e:
print(f"ERROR: Failed to set PyMuPDF-pro license: {e}")
exit(1)
# --- Example: Convert a DOCX file to PDF ---
# Replace 'path/to/your/document.docx' with an actual Office file path.
# You can use any supported format like .doc, .docx, .ppt, .pptx, .xls, .xlsx.
input_file = Path(os.environ.get('PYMUPDFPRO_INPUT_FILE', 'path/to/your/document.docx'))
output_file = Path("output.pdf")
if not input_file.exists() or 'path/to/your/document.docx' in str(input_file):
print(f"\nWARNING: Input file '{input_file}' not found or is a placeholder.")
print("Please provide a valid path to an Office document (e.g., .docx) for conversion.")
# For a truly runnable example without manual setup, one might create a dummy.
# For this quickstart, we'll indicate failure if the file isn't provided.
exit(1)
try:
# Use fitz.open_office_document (functionality added by PyMuPDF-pro)
doc = fitz.open_office_document(str(input_file))
doc.save(str(output_file)) # Save as PDF (default format)
doc.close()
print(f"\nSuccessfully converted '{input_file.name}' to '{output_file.name}'.")
except fitz.EmptyFilename:
print(f"ERROR: Input file path is invalid or empty: '{input_file}'.")
exit(1)
except Exception as e:
print(f"An error occurred during conversion: {e}")
# Common error here if license is invalid: 'RuntimeError: License key expired or not valid'
exit(1)
finally:
# Clean up the output file for a clean runnable example
if output_file.exists():
os.remove(output_file)
print(f"Cleaned up output file: {output_file}")