{"library":"pdftext","title":"PDFText","description":"pdftext is a Python library designed for fast and accurate extraction of structured text from PDF documents. It focuses on efficiently parsing text, detecting elements like tables and links, and handling complex layouts. The current version is 0.6.3, and it's actively maintained with frequent minor releases addressing bug fixes and introducing new features.","language":"python","status":"active","last_verified":"Sun May 17","install":{"commands":["pip install pdftext"],"cli":{"name":"pdftext","version":"Traceback (most recent call last):"}},"imports":["from pdftext import PDFText"],"auth":{"required":false,"env_vars":[]},"quickstart":{"code":"import os\nfrom pdftext import PDFText\n\n# Assuming 'example.pdf' is in the same directory\n# For a real application, replace with a valid path to your PDF file\npdf_path = os.path.join(os.path.dirname(__file__), 'example.pdf') # Replace or create example.pdf\n\n# Create a dummy PDF for demonstration if it doesn't exist\n# In a real scenario, you'd have your actual PDF here.\n# For a proper quickstart, you'd need a real PDF. This is just to make it runnable.\n# For local testing, ensure 'example.pdf' exists.\n# You can create a simple one: print('Hello PDF') > example.pdf (then convert to actual PDF)\n\n# --- This part requires an actual PDF file ---\n# To make this truly runnable for testing, one would need to create a dummy PDF file\n# or specify a path to an existing one.\n\n# For local testing, ensure a file named 'example.pdf' exists in the script's directory.\n# For a quick dummy, if you have FPDF installed:\n# from fpdf import FPDF\n# pdf = FPDF()\n# pdf.add_page()\n# pdf.set_font('Arial', 'B', 16)\n# pdf.cell(40, 10, 'Hello, pdftext!')\n# pdf.output(pdf_path)\n\n# Let's assume pdf_path points to an existing PDF for this example.\n# If you don't have an example.pdf, this will fail with FileNotFoundError.\n\ntry:\n    # Initialize PDFText with the path to your PDF\n    pdf_processor = PDFText(pdf_path)\n\n    # Extract all text as a single string\n    full_text = pdf_processor.as_text()\n    print(\"--- Full Text ---\")\n    print(full_text)\n\n    # Extract text as blocks\n    text_blocks = pdf_processor.as_blocks()\n    print(\"\\n--- Text Blocks ---\")\n    for i, block in enumerate(text_blocks[:2]): # Print first 2 blocks\n        print(f\"Block {i+1}: {block.text[:100]}...\")\n\n    # Extract text as lines (for detailed layout analysis)\n    text_lines = pdf_processor.as_lines()\n    print(\"\\n--- Text Lines (first 5) ---\")\n    for i, line in enumerate(text_lines[:5]):\n        print(f\"Line {i+1}: {line.text}\")\n\n    # Extract tables (if any)\n    tables = pdf_processor.as_tables()\n    if tables:\n        print(\"\\n--- Tables (first) ---\")\n        print(tables[0].to_csv())\n    else:\n        print(\"\\nNo tables found.\")\n\nexcept FileNotFoundError:\n    print(f\"Error: PDF file not found at {pdf_path}. Please create or specify a valid PDF.\")\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")","lang":"python","description":"This quickstart demonstrates how to initialize `PDFText` with a PDF file, extract the full text, retrieve text as structured blocks and lines, and extract tables. It assumes a PDF file named 'example.pdf' exists at the specified path for successful execution.","tag":null,"tag_description":null,"last_tested":null,"results":[]},"compatibility":{"tag":null,"tag_description":null,"last_tested":"2026-05-17","installed_version":"0.3.14","pypi_latest":"0.6.3","is_stale":true,"summary":{"python_range":"3.10–3.9","success_rate":90,"avg_install_s":5.6,"avg_import_s":null,"wheel_type":"wheel"},"results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"pdftext","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":"44.9M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"pdftext","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":4.3,"import_time_s":null,"mem_mb":null,"disk_size":"35M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"pdftext","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":"47.9M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"pdftext","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":3.4,"import_time_s":null,"mem_mb":null,"disk_size":"38M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"pdftext","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":"39.5M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"pdftext","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":2.9,"import_time_s":null,"mem_mb":null,"disk_size":"30M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"pdftext","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":"39.3M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"pdftext","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":3.3,"import_time_s":null,"mem_mb":null,"disk_size":"30M"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"pdftext","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"pdftext","exit_code":0,"wheel_type":"wheel","failure_reason":null,"import_side_effects":"broken","install_time_s":14.2,"import_time_s":null,"mem_mb":null,"disk_size":"215M"}]}}