{"id":1293,"library":"html2text","title":"html2text","description":"html2text is a Python library that efficiently converts HTML into clean, easy-to-read plain ASCII text, which is also valid Markdown. It provides extensive customization options for the conversion process. The library maintains an active and healthy development status with regular releases, ensuring ongoing support and feature enhancements.","status":"active","version":"2025.4.15","language":"en","source_language":"en","source_url":"https://github.com/Alir3z4/html2text/","tags":["html","markdown","text conversion","web scraping"],"install":[{"cmd":"pip install html2text","lang":"bash","label":"Install latest version"}],"dependencies":[],"imports":[{"note":"Direct function for simple conversion without custom configuration.","symbol":"html2text","correct":"import html2text\ntext = html2text.html2text(html_content)"},{"note":"Class-based approach for custom configurations like ignoring links, setting body width, etc.","symbol":"HTML2Text","correct":"from html2text import HTML2Text\nh = HTML2Text()\nh.ignore_links = True\ntext = h.handle(html_content)"}],"quickstart":{"code":"import html2text\n\nhtml_content = \"\"\"\n<h1>Welcome</h1>\n<p>Hello, <b>world</b>! This is a <a href=\"https://example.com\">link</a>.</p>\n<ul>\n  <li>Item 1</li>\n  <li>Item 2</li>\n</ul>\n\"\"\"\n\n# Basic conversion\nplain_text = html2text.html2text(html_content)\nprint(\"--- Basic Conversion ---\")\nprint(plain_text)\n\n# Custom conversion with options (e.g., ignoring links and no line wrapping)\nh = html2text.HTML2Text()\nh.ignore_links = True # Do not include link URLs\nh.body_width = 0     # Disable line wrapping\n\ncustom_text = h.handle(html_content)\nprint(\"\\n--- Custom Conversion (No links, no wrap) ---\")\nprint(custom_text)","lang":"python","description":"This quickstart demonstrates both the simple `html2text()` function for basic conversion and the `HTML2Text` class for more granular control over the output, such as ignoring links or disabling line wrapping."},"warnings":[{"fix":"Upgrade your Python environment to 3.9 or newer.","message":"Support for Python 2.x and older Python 3 versions was removed in release 2019.8.11. The library now officially requires Python 3.9 or newer.","severity":"breaking","affected_versions":"<2019.8.11"},{"fix":"Fetch HTML content using a dedicated HTTP client (e.g., `requests`) and pass the HTML string to `html2text.html2text()` or `HTML2Text().handle()`.","message":"The functionality to retrieve HTML over the network by passing URLs directly to the library was removed in release 2019.8.11. Earlier versions issued deprecation warnings for this feature.","severity":"breaking","affected_versions":"<2019.8.11"},{"fix":"Use `h = html2text.HTML2Text(); h.option = True; h.handle(html)` for custom configurations.","message":"To configure conversion options (e.g., `ignore_links`, `body_width`, `images_as_html`), you must create an instance of `html2text.HTML2Text()` and set properties on it, then call its `handle()` method. The top-level `html2text.html2text()` function does not accept these configuration options directly.","severity":"gotcha","affected_versions":"All"},{"fix":"When using `HTML2Text()`, set `h.body_width = 0` to prevent line wrapping.","message":"By default, `html2text` may wrap long lines. To disable this, which is often desirable for programmatic parsing or specific Markdown formatting, set the `body_width` option to `0`.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-08T00:00:00.000Z","next_check":"2026-07-07T00:00:00.000Z"}