mdit-plain
mdit-plain is a Python renderer for the markdown-it-py library, designed to convert Markdown documents into clean plain text by effectively stripping out all markup. Its primary purpose is to facilitate Natural Language Processing (NLP) and other text-based analyses where unformatted content is required. The current version is 1.0.1, released in January 2023, indicating a slow release cadence since then.
Common errors
-
ModuleNotFoundError: No module named 'markdown_it'
cause The core dependency 'markdown-it-py' is not installed.fixInstall the required dependency: `pip install markdown-it-py` -
ImportError: cannot import name 'RendererPlain' from 'mdit_plain.renderer'
cause mdit-plain is not installed, or there's a typo in the import path.fixEnsure mdit-plain is installed (`pip install mdit-plain`) and the import statement is `from mdit_plain.renderer import RendererPlain`. -
Markdown syntax (e.g., `**bold**`, `[link](url)`) is still present in the output after rendering.
cause The MarkdownIt parser was not correctly configured to use `mdit-plain.RendererPlain`. It's likely using markdown-it-py's default HTML renderer.fixWhen initializing `MarkdownIt`, ensure you pass `renderer_cls=RendererPlain`: `parser = MarkdownIt(renderer_cls=RendererPlain)`.
Warnings
- breaking mdit-plain relies on the internal API of markdown-it-py. Major version updates in markdown-it-py (e.g., v4.0.0, released after mdit-plain 1.0.1) may introduce breaking internal API changes that can cause mdit-plain to fail or produce incorrect output.
- gotcha mdit-plain primarily strips Markdown syntax. It does not perform advanced text normalization, such as intelligently collapsing multiple spaces, standardizing inconsistent line breaks, or converting complex Markdown structures (like tables) into highly readable plain text formats. The output might retain some whitespace artifacts.
- gotcha The library has not seen updates since January 2023. While functional for its core purpose, this indicates limited active maintenance for new Markdown features, bug fixes for edge cases, or compatibility with future Python versions or markdown-it-py releases.
Install
-
pip install mdit-plain
Imports
- RendererPlain
from mdit_plain.renderer import RendererPlain
- MarkdownIt
from markdown_it_py import MarkdownIt
from markdown_it import MarkdownIt
Quickstart
from markdown_it import MarkdownIt from mdit_plain.renderer import RendererPlain markdown_text = """ # Header One This is **some** *markdown* text with a [link](https://example.com). * List item 1 * List item 2 > A blockquote. """ # Initialize MarkdownIt parser with the plain text renderer parser = MarkdownIt(renderer_cls=RendererPlain) # Render the markdown to plain text plain_text = parser.render(markdown_text) print(plain_text) # Expected Output: # Header One # # This is some markdown text with a link. # # * List item 1 # * List item 2 # # > A blockquote.