MarkItDown
MarkItDown is a Python utility library designed for converting various file formats, such as DOCX, PDF, and CSV, into Markdown. It supports different input sources including local files, URLs, and data URIs. The current version is 0.1.5, with an active development cadence featuring regular maintenance and feature releases.
Warnings
- gotcha Starting with v0.1.0, MarkItDown introduced a plugin-based architecture and optional dependency groups (e.g., `[docx]`, `[pdf]`, `[all]`). If you only `pip install markitdown`, you may lack converters for specific file types.
- deprecated The `convert_url` method was renamed to `convert_uri` in v0.1.1. While `convert_url` remains an alias for backward compatibility, new code should prefer `convert_uri`.
- gotcha The `onnxruntime` dependency has seen several changes (pinned in v0.1.3 on Windows, removed upper bound in v0.1.5). Users might encounter `onnxruntime` version conflicts, especially in complex environments.
Install
-
pip install markitdown -
pip install markitdown[all] -
pip install markitdown[docx,pdf]
Imports
- MarkItDown
from markitdown import MarkItDown
Quickstart
from markitdown import MarkItDown
import base64
markitdown = MarkItDown()
# Convert a data URI containing plain text to Markdown
text_content = "Hello from MarkItDown! This is a test.\n\n- Item 1\n- Item 2"
base64_content = base64.b64encode(text_content.encode('utf-8')).decode('utf-8')
data_uri = f"data:text/plain;base64,{base64_content}"
result = markitdown.convert_uri(data_uri)
print(f"Converted Markdown:\n{result.markdown}")
# Example of how you would convert a local file (replace with an actual path)
# try:
# result = markitdown.convert_uri("file:///path/to/your/document.docx")
# print(f"Converted DOCX:\n{result.markdown}")
# except Exception as e:
# print(f"Could not convert DOCX: {e} (Ensure you have installed markitdown[docx] and the file exists)")