{"id":6228,"library":"rpaframework-pdf","title":"RPA Framework PDF Library","description":"RPA Framework PDF Library (`rpaframework-pdf`) is a Python library for managing PDF documents. It provides functionalities such as extracting text, adding watermarks, encrypting/decrypting documents, and merging/splitting PDFs. It is part of the broader RPA Framework, an actively maintained collection of open-source libraries for Robotic Process Automation (RPA), designed for both Robot Framework and Python. The current version is 10.0.3, released on March 6, 2026.","status":"active","version":"10.0.3","language":"en","source_language":"en","source_url":"https://github.com/robocorp/rpaframework","tags":["RPA","PDF","automation","robotframework"],"install":[{"cmd":"pip install rpaframework-pdf","lang":"bash","label":"Install `rpaframework-pdf`"}],"dependencies":[{"reason":"Core library for PDF manipulation and processing, upgraded to >=6.6 for security.","package":"pypdf"},{"reason":"Used for advanced text extraction from PDFs, upgraded to >=20251107 for security.","package":"pdfminer-six"},{"reason":"A foundational support package for other RPA Framework libraries, often included as a transitive dependency.","package":"rpaframework-core","optional":true}],"imports":[{"symbol":"PDF","correct":"from RPA.PDF import PDF"}],"quickstart":{"code":"import os\nfrom fpdf import FPDF\nfrom RPA.PDF import PDF\n\ndef create_dummy_pdf(filename):\n    pdf = FPDF()\n    pdf.add_page()\n    pdf.set_font(\"Arial\", size=12)\n    pdf.cell(200, 10, txt=\"Hello from RPA Framework!\", ln=True, align=\"C\")\n    pdf.cell(200, 10, txt=\"This is a test document.\", ln=True, align=\"L\")\n    pdf.output(filename)\n\n\ndef main():\n    input_pdf = \"test_document.pdf\"\n    output_pdf = \"extracted_text.txt\"\n\n    # Create a dummy PDF for demonstration\n    create_dummy_pdf(input_pdf)\n    print(f\"Created: {input_pdf}\")\n\n    pdf_lib = PDF()\n\n    # Example 1: Get text from PDF\n    print(f\"\\nExtracting text from {input_pdf}...\")\n    text_data = pdf_lib.get_text_from_pdf(input_pdf)\n    # text_data is a dictionary where keys are page numbers (1-indexed)\n    extracted_text = \"\\n\".join(text_data.get(1, [])) # Get text from the first page\n\n    with open(output_pdf, \"w\") as f:\n        f.write(extracted_text)\n    print(f\"Extracted text saved to {output_pdf}:\")\n    print(extracted_text)\n\n    # Clean up dummy files\n    os.remove(input_pdf)\n    os.remove(output_pdf)\n    print(\"\\nCleaned up dummy files.\")\n\nif __name__ == \"__main__\":\n    # Requires 'fpdf2' to create the dummy PDF for this quickstart\n    # pip install rpaframework-pdf fpdf2\n    main()","lang":"python","description":"This quickstart demonstrates how to use `rpaframework-pdf` to extract text from a PDF. It first creates a simple PDF file using `fpdf2` (which needs to be installed separately) and then uses `RPA.PDF`'s `get_text_from_pdf` keyword to read its content. The extracted text is then printed and saved to a text file. Finally, it cleans up the created files."},"warnings":[{"fix":"Ensure PDFs are text-based or utilize `RPA.DocumentAI` for image-based PDFs.","message":"The library primarily works with text-based PDFs. It cannot reliably extract information from image-based (scanned) PDF files. For such cases, specialized external services wrapped by the `RPA.DocumentAI` library are recommended.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade to the latest version of `rpaframework-pdf`. For extremely large PDFs, consider processing strategies that minimize full document parsing where possible.","message":"Historically, keywords like `Get Text From PDF` would parse the entire document even when only specific pages were requested, leading to performance issues with very large PDF files. While improvements have been made, users should be mindful of performance when processing exceptionally large documents.","severity":"gotcha","affected_versions":"< 7.1.6 (improvements in #300, 7.1.6)"},{"fix":"Ensure you are using the latest `rpaframework-pdf` and `rpaframework` versions, which generally have broader compatibility. If maintaining an older setup, constrain `robotframework` to `~=5.0.0` or use a dedicated virtual environment.","message":"Older versions of `rpaframework-pdf` (e.g., 7.1.5) and the broader `rpaframework` meta-package had compatibility constraints with `robotframework` versions >= 6.0, often downgrading `robotframework` during installation. While newer versions of `rpaframework` (including `rpaframework-pdf`) have updated their dependencies to support more recent Python and Robot Framework versions, users on older `rpaframework` setups might encounter this dependency conflict.","severity":"gotcha","affected_versions":"<= 7.1.5"}],"env_vars":null,"last_verified":"2026-04-14T00:00:00.000Z","next_check":"2026-07-13T00:00:00.000Z","problems":[]}