MarkItDown

0.1.5 · active · verified Thu Apr 09

MarkItDown is a Python utility library designed for converting various file formats, such as DOCX, PDF, and CSV, into Markdown. It supports different input sources including local files, URLs, and data URIs. The current version is 0.1.5, with an active development cadence featuring regular maintenance and feature releases.

Warnings

Install

Imports

Quickstart

Initializes MarkItDown and demonstrates conversion using a data URI. It also includes commented-out code for converting a local file, highlighting the need for correct optional dependency installation.

from markitdown import MarkItDown
import base64

markitdown = MarkItDown()

# Convert a data URI containing plain text to Markdown
text_content = "Hello from MarkItDown! This is a test.\n\n- Item 1\n- Item 2"
base64_content = base64.b64encode(text_content.encode('utf-8')).decode('utf-8')
data_uri = f"data:text/plain;base64,{base64_content}"

result = markitdown.convert_uri(data_uri)
print(f"Converted Markdown:\n{result.markdown}")

# Example of how you would convert a local file (replace with an actual path)
# try:
#     result = markitdown.convert_uri("file:///path/to/your/document.docx")
#     print(f"Converted DOCX:\n{result.markdown}")
# except Exception as e:
#     print(f"Could not convert DOCX: {e} (Ensure you have installed markitdown[docx] and the file exists)")

view raw JSON →