PyPDF2 (DEPRECATED: migrate to PyPDF)

3.0.1 · deprecated · verified Sun Mar 29

PyPDF2 is a pure-Python library designed for PDF file manipulation, offering capabilities like splitting, merging, cropping, and transforming PDF pages. The `pypdf2` package on PyPI, with its final major version 3.0.1, is now officially deprecated. It functions as a compatibility wrapper, internally using the API of `pypdf` version 3.0.1. All active development, new features, and security updates are happening under the `pypdf` project (currently at version 6.x.x), which is the recommended library for all new and ongoing PDF processing tasks in Python.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates basic PDF operations (reading, extracting text, merging, and adding pages) using `pypdf`, the actively maintained successor to `PyPDF2`. It creates a dummy PDF if one doesn't exist for the example to run.

from pypdf import PdfReader, PdfWriter
import os

# Create a dummy PDF for demonstration if it doesn't exist
dummy_pdf_path = "example.pdf"
if not os.path.exists(dummy_pdf_path):
    writer = PdfWriter()
    writer.add_blank_page(width=72, height=72)
    writer.add_blank_page(width=72, height=72)
    with open(dummy_pdf_path, "wb") as f:
        writer.write(f)

# --- Example: Read, extract text, and merge pages using pypdf (successor to PyPDF2) ---

# Create a PdfReader object
reader = PdfReader(dummy_pdf_path)

# Get number of pages
num_pages = len(reader.pages)
print(f"Number of pages: {num_pages}")

# Extract text from the first page
first_page = reader.pages[0]
text = first_page.extract_text()
print(f"Text from first page: '{text.strip() if text else 'No text'}'")

# Create a PdfWriter object for merging
writer = PdfWriter()

# Add all pages from the reader to the writer
for page in reader.pages:
    writer.add_page(page)

# Add a blank page
writer.add_blank_page(width=72, height=72)

# Write the output PDF to a file
output_pdf_path = "merged_output.pdf"
with open(output_pdf_path, "wb") as fp:
    writer.write(fp)

print(f"Successfully created {output_pdf_path} with {len(writer.pages)} pages.")

# Clean up dummy file
os.remove(dummy_pdf_path)
os.remove(output_pdf_path)

view raw JSON →