pdf2image Library
pdf2image is a Python library that acts as a wrapper around the command-line utilities `pdftoppm` and `pdftocairo` (parts of the Poppler PDF rendering library) to convert PDF documents into a list of PIL Image objects. It provides a convenient Pythonic interface for tasks like document display, data processing, and creating thumbnails. The current version is 1.17.0, and it maintains an active release cadence.
Warnings
- gotcha pdf2image is a wrapper and **requires Poppler command-line utilities** (`pdftoppm` and `pdftocairo`) to be installed on your system. If Poppler is not installed or its `bin` directory is not in your system's PATH, you will encounter `PDFInfoNotInstalledError` or similar issues.
- gotcha Converting large PDF files without specifying an `output_folder` can lead to excessive memory consumption, potentially causing the process to be killed.
- deprecated Version 1.13.0 was explicitly deprecated shortly after its release due to an issue with `convert_from_bytes` not respecting the `use_pdftocairo` parameter.
- breaking Version 1.12.0 introduced a deadlock on Windows when using `convert_from_path` with multiple threads and was subsequently removed from PyPI.
- gotcha Using outdated versions of Poppler can lead to `PDFPageCountError`, `Syntax Error`, or other unexpected issues when processing certain PDFs.
Install
-
pip install pdf2image
Imports
- convert_from_path
from pdf2image import convert_from_path
- convert_from_bytes
from pdf2image import convert_from_bytes
Quickstart
import os
import tempfile
from pdf2image import convert_from_path
# NOTE: For this code to run, you need Poppler installed and in your PATH.
# Create a dummy PDF file for the example (replace with your actual PDF path)
# This example assumes 'dummy.pdf' exists in the same directory.
# In a real scenario, you'd provide the path to an existing PDF.
if not os.path.exists('dummy.pdf'):
print("Please create a 'dummy.pdf' file in the current directory or provide a valid path.")
# Example: Create a simple dummy PDF using a library like ReportLab or manually
# For demonstration, we'll simulate a successful conversion if no PDF exists
# by skipping the actual conversion and printing a message.
# In a real app, you'd handle this error.
else:
try:
with tempfile.TemporaryDirectory() as path:
images = convert_from_path(
'dummy.pdf',
output_folder=path,
fmt='jpeg',
dpi=200
)
for i, image in enumerate(images):
output_filename = f"output_page_{i+1}.jpeg"
image.save(output_filename, 'JPEG')
print(f"Saved {output_filename}")
print("PDF conversion successful (if 'dummy.pdf' existed and Poppler was configured).")
except Exception as e:
print(f"An error occurred during PDF conversion: {e}")
print("Please ensure Poppler is installed and its 'bin' directory is in your system's PATH.")
print("For Windows, you might need to specify poppler_path=r'C:\path\to\poppler\bin' in convert_from_path.")