PyPDFium2
raw JSON → 5.6.0 verified Tue May 12 auth: no python install: verified
pypdfium2 is an ABI-level Python 3 binding to PDFium, a powerful and liberal-licensed library for PDF rendering, inspection, manipulation, and creation. It provides both low-level ctypes-based access to the raw PDFium API and a higher-level Pythonic support model for common use cases. The library is actively maintained, with frequent updates often tied to new PDFium versions, and the current version is 5.6.0.
pip install pypdfium2 Warnings
breaking In v5.0.0, `PdfDocument.render()` and `PdfBitmap.get_info()` were removed. `PdfDocument.render()` had performance issues due to bitmap transfer overhead in multiprocessing. ↓
fix Use `PdfPage.render()` with a loop or process pool for rendering pages. Retrieve bitmap information directly from the `PdfBitmap` object instead of `PdfBitmap.get_info()`.
gotcha PDFium is not thread-safe. Simultaneous calls to PDFium functions across different threads (even with different documents) can lead to crashes or corruption. ↓
fix For parallelizing expensive PDFium tasks like rendering, use `multiprocessing` (processes) instead of `threading` (threads). If using threads, ensure only one PDFium call is made at a time (e.g., using a mutex).
gotcha When using Windows-only API members included in bindings (from v5.5.0), it is strongly recommended to `ctypes.cast()` your `HDC` object to `pypdfium2.raw.HDC` before passing it to `FPDF_RenderPage()` to ensure type compatibility. ↓
fix Explicitly cast `HDC` objects: `ctypes.cast(your_hdc, pypdfium2.raw.HDC)`.
gotcha Opening password-protected PDFs on `s390x` and `musllinux_armv7l` architectures is known to be broken (as of v5.6.0). Builds for these platforms are provided but are considered 'use at own risk' with no warranty. ↓
fix Avoid using password-protected PDFs on these specific architectures or conduct thorough testing before deployment. Consider alternative platforms if this is a critical use case.
gotcha There is a risk of Python garbage collection prematurely freeing objects that are still needed by PDFium's C API, leading to non-deterministic segmentation faults (dangling object issues). This applies when using the raw API or if Python-managed resources are passed to it. ↓
fix Ensure that any Python objects (like byte buffers, callback functions) whose memory is managed by Python but referenced by PDFium's C functions are explicitly kept alive for the entire duration they are needed by PDFium. Reference these objects in an accompanying class or similar mechanism.
breaking Versions 4.30.1 and 5.0.0b1 were yanked from PyPI due to text extraction regressions in the underlying PDFium library. While specific to older versions, it highlights that underlying PDFium changes can introduce regressions. ↓
fix Always use the latest stable version of `pypdfium2` and consult release notes for any known issues or specific PDFium updates that might affect critical functionality like text extraction or rendering.
Install compatibility verified last tested: 2026-05-12
python os / libc status wheel install import disk
3.10 alpine (musl) wheel - 0.12s 28.4M
3.10 alpine (musl) - - 0.10s 28.4M
3.10 slim (glibc) wheel 1.9s 0.08s 27M
3.10 slim (glibc) - - 0.08s 27M
3.11 alpine (musl) wheel - 0.37s 30.4M
3.11 alpine (musl) - - 0.45s 30.3M
3.11 slim (glibc) wheel 2.0s 0.33s 29M
3.11 slim (glibc) - - 0.37s 28M
3.12 alpine (musl) wheel - 0.21s 22.2M
3.12 alpine (musl) - - 0.23s 22.2M
3.12 slim (glibc) wheel 1.8s 0.24s 20M
3.12 slim (glibc) - - 0.27s 20M
3.13 alpine (musl) wheel - 0.20s 22.0M
3.13 alpine (musl) - - 0.23s 21.8M
3.13 slim (glibc) wheel 1.9s 0.23s 20M
3.13 slim (glibc) - - 0.26s 20M
3.9 alpine (musl) wheel - 0.09s 27.9M
3.9 alpine (musl) - - 0.10s 27.9M
3.9 slim (glibc) wheel 2.3s 0.10s 26M
3.9 slim (glibc) - - 0.10s 26M
Imports
- pdfium
import pypdfium2 as pdfium - raw
import pypdfium2.raw as pdfium_c - internal
import pypdfium2.internal as pdfium_i
Quickstart last tested: 2026-04-24
import pypdfium2 as pdfium
import os
# Create a new, empty PDF document with one A4 page
pdf = pdfium.PdfDocument.new()
page = pdf.new_page(595, 842) # A4 size in points (width, height)
page.close()
output_filename = 'example_output.pdf'
pdf.save(output_filename, version=17)
pdf.close()
print(f"Created PDF: {output_filename}")
# Example of opening an existing PDF and getting info (requires a dummy file)
# To make this runnable, we'll open the one we just created.
if os.path.exists(output_filename):
existing_pdf = pdfium.PdfDocument(output_filename)
print(f"Opened existing PDF with {len(existing_pdf)} pages.")
existing_pdf.close()
os.remove(output_filename) # Clean up the dummy file
print(f"Cleaned up {output_filename}")
else:
print(f"Error: {output_filename} not found for reading example.")