pypdf

raw JSON →
6.9.2 verified Tue May 12 auth: no python install: verified quickstart: verified

pypdf is a free and open-source pure-Python library designed for PDF file manipulation. It can split, merge, crop, and transform PDF files, as well as add custom data, viewing options, and passwords. It also supports retrieving text and metadata from PDFs. Currently at version 6.9.2, the library is actively maintained with a rapid release cadence, often seeing multiple updates per month to address bugs, security, and performance improvements.

pip install pypdf
error ModuleNotFoundError: No module named 'pypdf'
cause The 'pypdf' library is not installed in the current Python environment.
fix
pip install pypdf
error ImportError: cannot import name 'PdfFileReader' from 'pypdf'
cause The class 'PdfFileReader' was renamed to 'PdfReader' in pypdf version 6 and later, following a major API overhaul.
fix
from pypdf import PdfReader # Similarly, PdfFileWriter was renamed to PdfWriter from pypdf import PdfWriter
error AttributeError: 'PdfReader' object has no attribute 'getNumPages'
cause The methods 'getNumPages()' and 'getPage()' are from the deprecated PyPDF2 API; pypdf v6+ exposes pages as a list-like attribute, 'reader.pages'.
fix
from pypdf import PdfReader reader = PdfReader("example.pdf") num_pages = len(reader.pages) # Get number of pages first_page = reader.pages[0] # Access a specific page
error TypeError: 'list' object is not callable
cause In pypdf v6+, 'reader.pages' is an attribute (a list-like object) that provides access to the PDF pages, not a method to be called.
fix
from pypdf import PdfReader reader = PdfReader("example.pdf") page_count = len(reader.pages) # Access as attribute, not a method
breaking The library was renamed from PyPDF2 to pypdf. This involved a package name change and significant class and method renames (e.g., `PdfFileReader` to `PdfReader`, `PdfFileWriter` to `PdfWriter`, `PdfFileMerger` to `PdfMerger`, and methods like `getNumPages()` to `len(reader.pages)`).
fix Update your import statements from `from PyPDF2 import ...` to `from pypdf import ...` and adapt to the new class and method names following PEP8 conventions (e.g., `reader.getNumPages()` becomes `len(reader.pages)`). Consult the migration guide for a full list of changes.
deprecated Support for abbreviations in `decode_stream_data` was deprecated.
fix Avoid using abbreviations when calling `decode_stream_data`. Review the official documentation for the recommended usage.
gotcha Calling `PageObject.replace_contents()` for pages not assigned to a `PdfWriter` is deprecated. This can lead to unexpected behavior and will be removed in pypdf 7.0.0.
fix Ensure that `PageObject.replace_contents()` is only called on `PageObject` instances that are part of a `PdfWriter` object. The documentation advises against using it directly on pages from a `PdfReader`.
gotcha Older versions of pypdf might experience significant performance degradation (O(n²) complexity) when dealing with frequent `NameObject` read/write operations, especially with complex or large PDF files.
fix Upgrade to pypdf 6.9.0 or later, which includes performance improvements for `NameObject` handling.
gotcha Processing untrusted or malformed PDF files, especially with older versions of pypdf, can lead to security vulnerabilities such as infinite loops, excessive resource consumption, or crashes. Numerous security fixes address issues like circular references and stream length limits.
fix Keep `pypdf` updated to the latest version to benefit from security patches. Consider using `strict=True` when initializing `PdfReader` to raise exceptions for non-standard compliant PDFs, allowing for explicit error handling.
pip install pypdf[crypto]
python os / libc variant status wheel install import disk
3.10 alpine (musl) pypdf wheel - 2.88s 23.8M
3.10 alpine (musl) pypdf - - 2.87s 23.7M
3.10 alpine (musl) crypto wheel - 2.90s 40.0M
3.10 alpine (musl) crypto - - 1.98s 38.9M
3.10 slim (glibc) pypdf wheel 1.9s 4.02s 24M
3.10 slim (glibc) pypdf - - 4.99s 24M
3.10 slim (glibc) crypto wheel 2.7s 4.09s 40M
3.10 slim (glibc) crypto - - 5.11s 39M
3.11 alpine (musl) pypdf wheel - 0.77s 23.1M
3.11 alpine (musl) pypdf - - 0.75s 23.1M
3.11 alpine (musl) crypto wheel - 0.75s 39.9M
3.11 alpine (musl) crypto - - 0.76s 38.8M
3.11 slim (glibc) pypdf wheel 1.7s 0.64s 24M
3.11 slim (glibc) pypdf - - 0.61s 24M
3.11 slim (glibc) crypto wheel 2.4s 0.72s 40M
3.11 slim (glibc) crypto - - 0.63s 39M
3.12 alpine (musl) pypdf wheel - 0.60s 14.8M
3.12 alpine (musl) pypdf - - 0.60s 14.8M
3.12 alpine (musl) crypto wheel - 0.63s 31.5M
3.12 alpine (musl) crypto - - 0.60s 30.4M
3.12 slim (glibc) pypdf wheel 1.7s 0.65s 15M
3.12 slim (glibc) pypdf - - 0.64s 15M
3.12 slim (glibc) crypto wheel 2.3s 0.67s 32M
3.12 slim (glibc) crypto - - 0.66s 31M
3.13 alpine (musl) pypdf wheel - 0.58s 14.6M
3.13 alpine (musl) pypdf - - 0.58s 14.4M
3.13 alpine (musl) crypto wheel - 0.59s 31.3M
3.13 alpine (musl) crypto - - 0.60s 30.1M
3.13 slim (glibc) pypdf wheel 1.7s 0.61s 15M
3.13 slim (glibc) pypdf - - 0.60s 15M
3.13 slim (glibc) crypto wheel 2.4s 0.69s 32M
3.13 slim (glibc) crypto - - 0.62s 30M
3.9 alpine (musl) pypdf wheel - 0.26s 20.3M
3.9 alpine (musl) pypdf - - 0.27s 20.2M
3.9 alpine (musl) crypto wheel - 0.27s 37.2M
3.9 alpine (musl) crypto - - 0.28s 36.1M
3.9 slim (glibc) pypdf wheel 2.0s 0.24s 21M
3.9 slim (glibc) pypdf - - 0.22s 21M
3.9 slim (glibc) crypto wheel 3.2s 0.24s 38M
3.9 slim (glibc) crypto - - 0.23s 37M

This quickstart demonstrates how to create two simple PDF files and then merge them into a single output PDF using PdfReader and PdfWriter.

from pypdf import PdfReader, PdfWriter

# Create dummy PDF files for the example
with open("document1.pdf", "wb") as f:
    writer = PdfWriter()
    writer.add_blank_page(width=72, height=72)
    writer.add_page(writer.add_blank_page(width=72, height=72))
    writer.write(f)

with open("document2.pdf", "wb") as f:
    writer = PdfWriter()
    writer.add_blank_page(width=72, height=72)
    writer.add_page(writer.add_blank_page(width=72, height=72))
    writer.write(f)

# Merge multiple PDF files into one
writer = PdfWriter()

# Add pages from document1.pdf
reader1 = PdfReader("document1.pdf")
for page in reader1.pages:
    writer.add_page(page)

# Add pages from document2.pdf
reader2 = PdfReader("document2.pdf")
for page in reader2.pages:
    writer.add_page(page)

# Write the merged PDF to a new file
with open("merged_document.pdf", "wb") as output_pdf:
    writer.write(output_pdf)

print("PDFs merged successfully into merged_document.pdf")