magic-pdf

raw JSON →
1.3.12 verified Sat May 09 auth: no python

A practical tool for converting PDF to Markdown, part of the MinerU project by OpenDataLab. Current version is 1.3.12, requires Python >=3.10, <3.14. The package is actively maintained with frequent releases.

pip install magic-pdf
error ModuleNotFoundError: No module named 'magic_pdf'
cause Package not installed or imported with wrong name (hyphen instead of underscore).
fix
Run 'pip install magic-pdf' and use 'import magic_pdf'.
error AttributeError: module 'magic_pdf' has no attribute 'parse'
cause Using deprecated function name 'parse' after v1.3.0.
fix
Use 'magic_pdf.parse_pdf()' instead.
error KeyError: 'markdown'
cause Accessing 'markdown' key on result from older version (<1.3.0).
fix
Check version: if <1.3.0 use result['text'], else use result['markdown'].
gotcha The package name on PyPI is 'magic-pdf', but the import uses underscore: 'magic_pdf'.
fix Use 'pip install magic-pdf' and 'import magic_pdf'.
breaking Version 1.3.0 changed the output structure: the Markdown content is now under 'markdown' key instead of 'text'.
fix Access result['markdown'] for v1.3.0+, or result['text'] for older versions.
deprecated The function 'magic_pdf.parse' was deprecated in v1.3.0 in favor of 'magic_pdf.parse_pdf'.
fix Use 'parse_pdf' instead of 'parse'.

Convert a PDF to Markdown and print the result.

import magic_pdf
result = magic_pdf.parse_pdf('sample.pdf', output_dir='./output')
print(result['markdown'])