pdfrw2 - PDF Reader/Writer Library
pdfrw2 is a maintenance fork of the unmaintained pdfrw library, providing a Pythonic way to read, write, and manipulate PDF files. It allows for tasks like merging, splitting, watermarking, and adding annotations to PDFs. The current version is 0.5.0, and it follows an as-needed release cadence for bug fixes and compatibility updates.
Common errors
-
ModuleNotFoundError: No module named 'pdfrw2'
cause Attempting to import the package using its PyPI name `pdfrw2` instead of its module name `pdfrw`.fixChange your import statements from `import pdfrw2` to `import pdfrw`, or `from pdfrw2 import ...` to `from pdfrw import ...`. -
AttributeError: 'NoneType' object has no attribute 'encrypt'
cause Trying to open or manipulate an encrypted PDF without the `pycryptodome` library installed. `pdfrw2` dynamically loads encryption support.fixInstall the optional dependency for encryption: `pip install pycryptodome`. -
File "/path/to/pdfrw/pdfreader.py", line XYZ, in __init__ raise PdfParseError('File has no pages?')cause The PDF file being read is either empty, corrupted, or not a valid PDF document that pdfrw2 can parse.fixEnsure the input file is a valid, well-formed PDF. Check the file's integrity and content. Try opening it with a standard PDF viewer.
Warnings
- gotcha The PyPI package name is `pdfrw2`, but the Python module to import is `pdfrw`. Attempting to `import pdfrw2` will result in a `ModuleNotFoundError`.
- gotcha Handling encrypted PDF files requires the optional `pycryptodome` package. Without it, attempts to open or process encrypted PDFs will fail with an error indicating missing encryption support.
- gotcha If migrating code from very old versions of `pdfrw` (the unmaintained predecessor), you might encounter `ImportError` related to `collections` vs. `collections.abc` for abstract base classes. `pdfrw2` has addressed this, but ensure your Python environment is compatible.
Install
-
pip install pdfrw2
Imports
- PdfReader
from pdfrw2 import PdfReader
from pdfrw import PdfReader
- PdfWriter
from pdfrw2 import PdfWriter
from pdfrw import PdfWriter
- PageMerge
from pdfrw import PageMerge
Quickstart
import os
from pdfrw import PdfReader, PdfWriter
# Create dummy PDF files for demonstration if they don't exist
# In a real scenario, you would have existing PDF files.
# This example just copies a (potentially empty) file to simulate input.
# For a true example, you'd need a PDF generator or existing files.
def create_dummy_pdf(filename):
# This is a highly simplified 'creation' for demonstration purposes.
# In a real app, you'd use a library like reportlab or have actual PDFs.
with open(filename, 'w') as f:
f.write('%PDF-1.4\n1 0 obj <</Type/Catalog/Pages 2 0 R>> endobj\n2 0 obj <</Type/Pages/Count 0>> endobj\nxref\n0 3\n0000000000 65535 f\n0000000009 00000 n\n0000000052 00000 n\ntrailer<</Size 3/Root 1 0 R>>startxref\n106\n%%EOF')
input_pdf_path = 'input.pdf'
output_pdf_path = 'output.pdf'
if not os.path.exists(input_pdf_path):
print(f"Creating dummy {input_pdf_path} for quickstart...")
create_dummy_pdf(input_pdf_path)
try:
# Read an existing PDF
trailer = PdfReader(input_pdf_path)
# Create a new PDF writer
writer = PdfWriter()
# Add all pages from the input PDF to the writer
writer.addpages(trailer.pages)
# Write the combined PDF to a new file
writer.write(output_pdf_path)
print(f"Successfully copied '{input_pdf_path}' to '{output_pdf_path}'")
except Exception as e:
print(f"An error occurred: {e}")
finally:
# Clean up dummy files
# os.remove(input_pdf_path) # Uncomment to remove after running
# os.remove(output_pdf_path) # Uncomment to remove after running
pass