pdfrw Library
pdfrw is a pure Python library for reading and writing PDF files. It's designed for efficiency, offering capabilities for operations such as subsetting, merging, rotating, and modifying PDF metadata. The current version, 0.4, primarily focused on enhancing Python 3 compatibility and proper Unicode support. While still functional, its release cadence has been sporadic, and some sources suggest development has ceased.
Warnings
- gotcha pdfrw has limited support for compression and no built-in support for encryption in PDF files. For such files, users might need to pre-process them with external tools like `pdftk` to uncompress or decrypt before `pdfrw` can process them reliably.
- deprecated The library's development seems to have ceased, with the last release (v0.4) in 2017. While it remains functional for many tasks, users should be aware of the lack of ongoing maintenance and potential for unaddressed bugs or compatibility issues with newer Python versions or PDF specifications. Some sources explicitly state it is 'not maintained anymore'.
- breaking Initial versions of pdfrw (prior to v0.2) only supported Python 2. Support for Python 3 was introduced in v0.2. Older Python 2 codebases might require adaptation for Python 3 environments, particularly regarding string handling.
- gotcha Proper Unicode support for text strings in PDFs was added in v0.4. Earlier versions might exhibit issues when handling or embedding non-ASCII or international characters, which could lead to corrupted text or errors.
- gotcha When merging or manipulating PDFs, pdfrw might not preserve certain PDF features like bookmarks (outlines) as it often reconstructs page display information. This can result in a loss of navigation elements in the output PDF.
- gotcha Version 0.3 included fixes for several `PageMerge` bugs, specifically related to multiple program runs and state save/restore. Prior to these fixes, `PageMerge` operations could be unreliable or lead to unexpected behavior.
Install
-
pip install pdfrw
Imports
- PdfReader
from pdfrw import PdfReader
- PdfWriter
from pdfrw import PdfWriter
- PageMerge
from pdfrw import PageMerge
- *
from pdfrw import *
Quickstart
from pdfrw import PdfReader, PdfWriter, PageMerge
# Create a dummy input PDF for the example
# In a real scenario, 'input.pdf' would already exist.
writer = PdfWriter()
writer.addpages([PageMerge().add_text("Hello World").render()])
writer.write("input.pdf")
# Read an existing PDF
reader = PdfReader("input.pdf")
# Create a new PdfWriter object
writer = PdfWriter()
# Add all pages from the reader to the writer
writer.addpages(reader.pages)
# Write the content to a new PDF file (e.g., creating a copy)
writer.write("output_copy.pdf")
print("PDF 'input.pdf' read and copied to 'output_copy.pdf'")