Pypandoc
Pypandoc is a Python wrapper for Pandoc, a powerful universal document converter. It simplifies the process of converting files between various formats such as Markdown, HTML, DOCX, PDF, LaTeX, ODT, and EPUB. The library is actively maintained, with the current version being 1.17, and it follows a regular release cadence to incorporate updates and improvements.
Warnings
- breaking Python 2 support was officially dropped in pypandoc v1.8. Users on Python 2 must use an older version of pypandoc.
- breaking The `pypandoc.convert()` function was deprecated in v1.7.0 and removed in v1.8. Direct usage will raise an error.
- gotcha Pypandoc requires a working Pandoc installation. If you install `pypandoc` (without `_binary` or `[tinytex]`), you must install Pandoc separately via your system's package manager or manually. An `OSError` will be raised if Pandoc is not found.
- gotcha Pandoc's sandbox mode (for security) is enabled by default for Pandoc versions >= 2.15 when used via pypandoc v1.7.0 and newer. This might affect how certain filters or file operations behave.
- gotcha When passing `extra_args` to `pypandoc.convert_file` or `convert_text`, arguments and their values (like `-V key=value`) must be passed as separate items in a list due to `subprocess.Popen` behavior.
- gotcha For PDF conversion, Pandoc requires a LaTeX distribution (e.g., TeX Live, MiKTeX). Installing `pypandoc[tinytex]` simplifies this by automatically downloading and managing TinyTeX, but if not used, manual LaTeX setup is required.
Install
-
pip install pypandoc -
pip install pypandoc_binary -
pip install pypandoc[tinytex] -
conda install -c conda-forge pypandoc
Imports
- convert_file
import pypandoc pypandoc.convert_file('input.md', 'pdf', outputfile='output.pdf') - convert_text
import pypandoc html_output = pypandoc.convert_text('# Hello', 'html', format='md')
Quickstart
import pypandoc
import os
# Ensure pandoc is available (optional if using pypandoc_binary or pypandoc[tinytex])
# pypandoc.download_pandoc() # uncomment if you installed pypandoc without binary and don't have pandoc
# Create a dummy markdown file
with open('input.md', 'w') as f:
f.write('# Hello from Pypandoc\n\nThis is a test document.')
# Convert markdown file to PDF (requires pandoc and a LaTeX distribution like TinyTeX)
# For seamless PDF conversion, install with `pip install pypandoc[tinytex]`
pypandoc.convert_file('input.md', 'pdf', outputfile='output.pdf')
print("Converted input.md to output.pdf")
# Convert markdown string to HTML
html_output = pypandoc.convert_text('# Another Title\n\nSome **bold** text.', 'html', format='md')
print("Converted Markdown string to HTML:")
print(html_output)
# Clean up dummy files
os.remove('input.md')
os.remove('output.pdf')