html2text

2025.4.15 · active · verified Wed Apr 08

html2text is a Python library that efficiently converts HTML into clean, easy-to-read plain ASCII text, which is also valid Markdown. It provides extensive customization options for the conversion process. The library maintains an active and healthy development status with regular releases, ensuring ongoing support and feature enhancements.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates both the simple `html2text()` function for basic conversion and the `HTML2Text` class for more granular control over the output, such as ignoring links or disabling line wrapping.

import html2text

html_content = """
<h1>Welcome</h1>
<p>Hello, <b>world</b>! This is a <a href="https://example.com">link</a>.</p>
<ul>
  <li>Item 1</li>
  <li>Item 2</li>
</ul>
"""

# Basic conversion
plain_text = html2text.html2text(html_content)
print("--- Basic Conversion ---")
print(plain_text)

# Custom conversion with options (e.g., ignoring links and no line wrapping)
h = html2text.HTML2Text()
h.ignore_links = True # Do not include link URLs
h.body_width = 0     # Disable line wrapping

custom_text = h.handle(html_content)
print("\n--- Custom Conversion (No links, no wrap) ---")
print(custom_text)

view raw JSON →