RPA Framework PDF Library

10.0.3 · active · verified Tue Apr 14

RPA Framework PDF Library (`rpaframework-pdf`) is a Python library for managing PDF documents. It provides functionalities such as extracting text, adding watermarks, encrypting/decrypting documents, and merging/splitting PDFs. It is part of the broader RPA Framework, an actively maintained collection of open-source libraries for Robotic Process Automation (RPA), designed for both Robot Framework and Python. The current version is 10.0.3, released on March 6, 2026.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `rpaframework-pdf` to extract text from a PDF. It first creates a simple PDF file using `fpdf2` (which needs to be installed separately) and then uses `RPA.PDF`'s `get_text_from_pdf` keyword to read its content. The extracted text is then printed and saved to a text file. Finally, it cleans up the created files.

import os
from fpdf import FPDF
from RPA.PDF import PDF

def create_dummy_pdf(filename):
    pdf = FPDF()
    pdf.add_page()
    pdf.set_font("Arial", size=12)
    pdf.cell(200, 10, txt="Hello from RPA Framework!", ln=True, align="C")
    pdf.cell(200, 10, txt="This is a test document.", ln=True, align="L")
    pdf.output(filename)


def main():
    input_pdf = "test_document.pdf"
    output_pdf = "extracted_text.txt"

    # Create a dummy PDF for demonstration
    create_dummy_pdf(input_pdf)
    print(f"Created: {input_pdf}")

    pdf_lib = PDF()

    # Example 1: Get text from PDF
    print(f"\nExtracting text from {input_pdf}...")
    text_data = pdf_lib.get_text_from_pdf(input_pdf)
    # text_data is a dictionary where keys are page numbers (1-indexed)
    extracted_text = "\n".join(text_data.get(1, [])) # Get text from the first page

    with open(output_pdf, "w") as f:
        f.write(extracted_text)
    print(f"Extracted text saved to {output_pdf}:")
    print(extracted_text)

    # Clean up dummy files
    os.remove(input_pdf)
    os.remove(output_pdf)
    print("\nCleaned up dummy files.")

if __name__ == "__main__":
    # Requires 'fpdf2' to create the dummy PDF for this quickstart
    # pip install rpaframework-pdf fpdf2
    main()

view raw JSON →