pypdf
pypdf is a free and open-source pure-Python library designed for PDF file manipulation. It can split, merge, crop, and transform PDF files, as well as add custom data, viewing options, and passwords. It also supports retrieving text and metadata from PDFs. Currently at version 6.9.2, the library is actively maintained with a rapid release cadence, often seeing multiple updates per month to address bugs, security, and performance improvements.
Warnings
- breaking The library was renamed from PyPDF2 to pypdf. This involved a package name change and significant class and method renames (e.g., `PdfFileReader` to `PdfReader`, `PdfFileWriter` to `PdfWriter`, `PdfFileMerger` to `PdfMerger`, and methods like `getNumPages()` to `len(reader.pages)`).
- deprecated Support for abbreviations in `decode_stream_data` was deprecated.
- gotcha Calling `PageObject.replace_contents()` for pages not assigned to a `PdfWriter` is deprecated. This can lead to unexpected behavior and will be removed in pypdf 7.0.0.
- gotcha Older versions of pypdf might experience significant performance degradation (O(n²) complexity) when dealing with frequent `NameObject` read/write operations, especially with complex or large PDF files.
- gotcha Processing untrusted or malformed PDF files, especially with older versions of pypdf, can lead to security vulnerabilities such as infinite loops, excessive resource consumption, or crashes. Numerous security fixes address issues like circular references and stream length limits.
Install
-
pip install pypdf -
pip install pypdf[crypto]
Imports
- PdfReader
from pypdf import PdfReader
- PdfWriter
from pypdf import PdfWriter
- PdfMerger
from pypdf import PdfMerger
Quickstart
from pypdf import PdfReader, PdfWriter
# Create dummy PDF files for the example
with open("document1.pdf", "wb") as f:
writer = PdfWriter()
writer.add_blank_page(width=72, height=72)
writer.add_page(writer.add_blank_page(width=72, height=72))
writer.write(f)
with open("document2.pdf", "wb") as f:
writer = PdfWriter()
writer.add_blank_page(width=72, height=72)
writer.add_page(writer.add_blank_page(width=72, height=72))
writer.write(f)
# Merge multiple PDF files into one
writer = PdfWriter()
# Add pages from document1.pdf
reader1 = PdfReader("document1.pdf")
for page in reader1.pages:
writer.add_page(page)
# Add pages from document2.pdf
reader2 = PdfReader("document2.pdf")
for page in reader2.pages:
writer.add_page(page)
# Write the merged PDF to a new file
with open("merged_document.pdf", "wb") as output_pdf:
writer.write(output_pdf)
print("PDFs merged successfully into merged_document.pdf")