Pandoc Documents for Python

2.4 · active · verified Tue Apr 14

Pandoc is a powerful, open-source command-line tool for converting documents between various formats (e.g., Markdown, HTML, LaTeX, PDF, Word). The `pandoc` Python library (version 2.4, released August 7, 2024) provides Python bindings to interact with Pandoc's document model, allowing for in-Python analysis, creation, and transformation of documents. It leverages the underlying Haskell-based Pandoc executable, which must be installed separately. The library generally follows an active release cadence, with updates to support recent Pandoc executable versions.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to read a Markdown string into a Pandoc document object, access and modify its Abstract Syntax Tree (AST) using `pandoc.types`, and then write the modified document back to a Markdown string. This showcases the core functionality for programmatic document manipulation.

import pandoc
from pandoc.types import Str, Space, Para, Meta

# Read a simple markdown string into a Pandoc document object
text = "Hello world!"
doc = pandoc.read(text)
print(f"Initial document: {doc}")

# Access and modify an element in the document's Abstract Syntax Tree (AST)
# For "Hello world!", doc is Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])
# The paragraph is at doc[1][0]
# The 'world!' string is at doc[1][0][2][0]
paragraph = doc[1][0]

# Modify the 'world!' string to 'Python!'
# The Str object is at paragraph[2] (0: Str('Hello'), 1: Space(), 2: Str('world!'))
# The actual string value is the first element of the Str tuple: Str('world!')[0]
paragraph[2][0] = 'Python!'

# Write the modified document back to a markdown string
modified_text = pandoc.write(doc)
print(f"Modified document text: {modified_text.strip()}")

# Example of converting to a different format (requires actual pandoc executable)
# doc_to_convert = pandoc.read("# My Title\n\nHello from Pandoc!", format='markdown')
# html_output = pandoc.write(doc_to_convert, format='html')
# print(f"HTML output:\n{html_output}")

view raw JSON →