pdf2image Library

1.17.0 · active · verified Sun Apr 05

pdf2image is a Python library that acts as a wrapper around the command-line utilities `pdftoppm` and `pdftocairo` (parts of the Poppler PDF rendering library) to convert PDF documents into a list of PIL Image objects. It provides a convenient Pythonic interface for tasks like document display, data processing, and creating thumbnails. The current version is 1.17.0, and it maintains an active release cadence.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates converting a PDF file into a list of PIL Image objects using `convert_from_path` and saving each page as a JPEG image. It highlights the use of `output_folder` for efficiency with large PDFs and `fmt` for specifying the output image format. Users must ensure Poppler is installed and correctly configured in their system's PATH for the library to function.

import os
import tempfile
from pdf2image import convert_from_path

# NOTE: For this code to run, you need Poppler installed and in your PATH.
# Create a dummy PDF file for the example (replace with your actual PDF path)
# This example assumes 'dummy.pdf' exists in the same directory.
# In a real scenario, you'd provide the path to an existing PDF.
if not os.path.exists('dummy.pdf'):
    print("Please create a 'dummy.pdf' file in the current directory or provide a valid path.")
    # Example: Create a simple dummy PDF using a library like ReportLab or manually
    # For demonstration, we'll simulate a successful conversion if no PDF exists
    # by skipping the actual conversion and printing a message.
    # In a real app, you'd handle this error.
else:
    try:
        with tempfile.TemporaryDirectory() as path:
            images = convert_from_path(
                'dummy.pdf', 
                output_folder=path, 
                fmt='jpeg', 
                dpi=200
            )

            for i, image in enumerate(images):
                output_filename = f"output_page_{i+1}.jpeg"
                image.save(output_filename, 'JPEG')
                print(f"Saved {output_filename}")
        print("PDF conversion successful (if 'dummy.pdf' existed and Poppler was configured).")
    except Exception as e:
        print(f"An error occurred during PDF conversion: {e}")
        print("Please ensure Poppler is installed and its 'bin' directory is in your system's PATH.")
        print("For Windows, you might need to specify poppler_path=r'C:\path\to\poppler\bin' in convert_from_path.")

view raw JSON →