RPA Framework PDF Library
RPA Framework PDF Library (`rpaframework-pdf`) is a Python library for managing PDF documents. It provides functionalities such as extracting text, adding watermarks, encrypting/decrypting documents, and merging/splitting PDFs. It is part of the broader RPA Framework, an actively maintained collection of open-source libraries for Robotic Process Automation (RPA), designed for both Robot Framework and Python. The current version is 10.0.3, released on March 6, 2026.
Warnings
- gotcha The library primarily works with text-based PDFs. It cannot reliably extract information from image-based (scanned) PDF files. For such cases, specialized external services wrapped by the `RPA.DocumentAI` library are recommended.
- gotcha Historically, keywords like `Get Text From PDF` would parse the entire document even when only specific pages were requested, leading to performance issues with very large PDF files. While improvements have been made, users should be mindful of performance when processing exceptionally large documents.
- gotcha Older versions of `rpaframework-pdf` (e.g., 7.1.5) and the broader `rpaframework` meta-package had compatibility constraints with `robotframework` versions >= 6.0, often downgrading `robotframework` during installation. While newer versions of `rpaframework` (including `rpaframework-pdf`) have updated their dependencies to support more recent Python and Robot Framework versions, users on older `rpaframework` setups might encounter this dependency conflict.
Install
-
pip install rpaframework-pdf
Imports
- PDF
from RPA.PDF import PDF
Quickstart
import os
from fpdf import FPDF
from RPA.PDF import PDF
def create_dummy_pdf(filename):
pdf = FPDF()
pdf.add_page()
pdf.set_font("Arial", size=12)
pdf.cell(200, 10, txt="Hello from RPA Framework!", ln=True, align="C")
pdf.cell(200, 10, txt="This is a test document.", ln=True, align="L")
pdf.output(filename)
def main():
input_pdf = "test_document.pdf"
output_pdf = "extracted_text.txt"
# Create a dummy PDF for demonstration
create_dummy_pdf(input_pdf)
print(f"Created: {input_pdf}")
pdf_lib = PDF()
# Example 1: Get text from PDF
print(f"\nExtracting text from {input_pdf}...")
text_data = pdf_lib.get_text_from_pdf(input_pdf)
# text_data is a dictionary where keys are page numbers (1-indexed)
extracted_text = "\n".join(text_data.get(1, [])) # Get text from the first page
with open(output_pdf, "w") as f:
f.write(extracted_text)
print(f"Extracted text saved to {output_pdf}:")
print(extracted_text)
# Clean up dummy files
os.remove(input_pdf)
os.remove(output_pdf)
print("\nCleaned up dummy files.")
if __name__ == "__main__":
# Requires 'fpdf2' to create the dummy PDF for this quickstart
# pip install rpaframework-pdf fpdf2
main()