{"library":"pdftotext","title":"pdftotext","description":"pdftotext is a Python wrapper for the `pdftotext` command-line utility (part of the Poppler PDF rendering library). It provides a simple, efficient way to extract text from PDF documents. The current version is 3.0.0, and it has a moderate release cadence, with major updates happening less frequently than minor bug fixes.","language":"python","status":"active","last_verified":"Fri Apr 17","install":{"commands":["pip install pdftotext"],"cli":{"name":"pdftotext","version":""}},"imports":["import pdftotext"],"auth":{"required":false,"env_vars":[]},"quickstart":{"code":"import pdftotext\nimport os\n\n# Create a dummy PDF file for demonstration\ndummy_pdf_content = b\"%PDF-1.4\\n1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj 2 0 obj<</Type/Pages/Count 1/Kids[3 0 R]>>endobj 3 0 obj<</Type/Page/Parent 2 0 R/MediaBox[0 0 612 792]/Contents 4 0 R>>endobj 4 0 obj<</Length 44>>stream\\nBT /F1 24 Tf 100 700 Td (Hello, pdftotext!) Tj ET\\nendstream\\nendobj\\nxref\\n0 5\\n0000000000 65535 f\\n0000000009 00000 n\\n0000000055 00000 n\\n0000000109 00000 n\\n0000000216 00000 n\\ntrailer<</Size 5/Root 1 0 R>>startxref 303\\n%%EOF\"\nwith open(\"dummy.pdf\", \"wb\") as f:\n    f.write(dummy_pdf_content)\n\n# Load your PDF file\ntry:\n    with open(\"dummy.pdf\", \"rb\") as f:\n        pdf = pdftotext.PDF(f)\n\n    # Get all text from the document (each element is a page)\n    full_text = \"\\n\\n\".join(pdf)\n    print(\"--- Full PDF Text ---\")\n    print(full_text)\n\n    # Get text from a specific page (e.g., the first page)\n    if len(pdf) > 0:\n        first_page_text = pdf[0]\n        print(\"\\n--- First Page Text ---\")\n        print(first_page_text)\n    else:\n        print(\"\\nNo pages found in PDF.\")\nexcept pdftotext.Error as e:\n    print(f\"Error processing PDF: {e}. Make sure poppler-utils is installed.\")\nfinally:\n    # Clean up the dummy file\n    if os.path.exists(\"dummy.pdf\"):\n        os.remove(\"dummy.pdf\")\n","lang":"python","description":"This quickstart demonstrates how to load a PDF, extract all text by joining its pages, and access text from individual pages using list-like indexing. It also includes error handling for the common case where the underlying poppler-utils `pdftotext` command is not found.","tag":null,"tag_description":null,"last_tested":null,"results":[]},"compatibility":null}