{"id":6714,"library":"marker-pdf","title":"Marker PDF","description":"Marker PDF is a Python library that converts PDF documents to markdown with high speed and accuracy. Leveraging advanced OCR and layout analysis models, it aims to preserve the structure and content of the original document. As of version 1.10.2, it is actively developed with frequent minor releases focusing on model improvements, performance, and bug fixes.","status":"active","version":"1.10.2","language":"en","source_language":"en","source_url":"https://github.com/datalab-to/marker","tags":["PDF","markdown","OCR","document conversion","machine learning","NLP"],"install":[{"cmd":"pip install marker-pdf","lang":"bash","label":"Install stable release"}],"dependencies":[],"imports":[{"symbol":"convert_single_pdf","correct":"from marker.convert import convert_single_pdf"}],"quickstart":{"code":"import os\nfrom marker.convert import convert_single_pdf\n\n# Placeholder for your PDF file path. Replace with an actual path.\n# For a runnable example, ensure 'sample.pdf' exists or create a dummy.\npdf_path = os.environ.get('MARKER_PDF_PATH', 'sample.pdf')\n\n# Create a dummy PDF file if it doesn't exist for the example to be runnable\nif not os.path.exists(pdf_path):\n    try:\n        from pypdf import PdfWriter\n        writer = PdfWriter()\n        writer.add_blank_page(width=72, height=72)\n        with open(pdf_path, 'wb') as f:\n            writer.write(f)\n        print(f\"Created a dummy PDF at {pdf_path} for quickstart.\")\n    except ImportError:\n        print(\"To create a dummy PDF, install pypdf: `pip install pypdf`\")\n        print(f\"Please replace '{pdf_path}' with a path to a real PDF file.\")\n        pdf_path = None # Prevent execution if dummy couldn't be created\n\nif pdf_path and os.path.exists(pdf_path):\n    print(f\"Converting PDF: {pdf_path}\")\n    full_text, out_paths, _ = convert_single_pdf(\n        pdf_path,\n        recompile_pdf=True,\n        chunk_images=True\n        # Add other configuration as needed, e.g., processor_config\n    )\n\n    print(\"--- Markdown Output ---\")\n    print(full_text[:500]) # Print first 500 characters of markdown\n    print(f\"Extracted image paths: {out_paths}\")\nelse:\n    print(\"Skipping conversion: PDF path not valid or dummy PDF creation failed.\")","lang":"python","description":"This quickstart demonstrates how to convert a single PDF file to markdown using `convert_single_pdf`. It includes placeholders for a PDF path and shows how to retrieve the markdown output and any extracted image paths. For a truly runnable example without a pre-existing PDF, it attempts to create a dummy PDF using `pypdf`."},"warnings":[{"fix":"Remove the `format_lines` parameter from your `convert_single_pdf` calls. Consider using `force_ocr` or other `processor_config` options if you were trying to control OCR behavior.","message":"The `format_lines` parameter was removed from the `convert_single_pdf` API and CLI in `v1.8.3`. Users who relied on this parameter for fine-tuning output formatting will need to adjust their calls.","severity":"breaking","affected_versions":">=1.8.3"},{"fix":"Ensure your environment has sufficient resources. For production, consider using GPU acceleration if available. For performance tuning, experiment with `processor_config` parameters, though changes might yield varied results.","message":"Marker PDF uses deep learning models for OCR and layout analysis, which can be computationally intensive. Conversion can consume significant CPU and RAM, especially for large, complex, or image-heavy PDFs. Performance might also be impacted by model updates (e.g., 'block mode' in `v1.9.0` made it 'a bit slower').","severity":"gotcha","affected_versions":"all"},{"fix":"For critical applications requiring consistent output, pin your `marker-pdf` version. Review the output for complex PDFs and consider fine-tuning parameters via `processor_config`. For tables, `v1.10.0` introduced the `html_tables_in_markdown` option to render tables using HTML tags instead of markdown syntax, which can improve rendering in some cases.","message":"The quality and exact formatting of the generated markdown can vary significantly based on the input PDF's structure, clarity, and the specific version of Marker PDF used. Frequent model updates (e.g., in `v1.10.0`, `v1.8.3`) aim to improve accuracy but can lead to subtle differences in output between versions.","severity":"gotcha","affected_versions":"all"},{"fix":"Review the OpenRAIL-M-v1.0 license carefully to ensure compliance with your use case. Consult with legal counsel if you have questions regarding commercial or redistribution terms.","message":"The license for Marker PDF changed to an OpenRAIL-M-v1.0 license around `v1.8.5`. This is a significant change regarding the usage rights and commercial terms for the library and its models.","severity":"gotcha","affected_versions":">=1.8.5"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[]}