{"id":5959,"library":"html-to-markdown","title":"HTML to Markdown Converter","description":"html-to-markdown is a high-performance Python library for converting HTML to Markdown, powered by a Rust core. Currently at version 3.1.0, it offers a clean Python API and aims for consistent output across multiple language bindings. The library is actively maintained with ongoing development and performance enhancements.","status":"active","version":"3.1.0","language":"en","source_language":"en","source_url":"https://github.com/kreuzberg-dev/html-to-markdown.git","tags":["html","markdown","conversion","rust","performance","markup"],"install":[{"cmd":"pip install html-to-markdown","lang":"bash","label":"Install with pip"}],"dependencies":[],"imports":[{"note":"Primary function for basic HTML to Markdown conversion.","symbol":"convert","correct":"from html_to_markdown import convert"},{"note":"Used to customize conversion behavior (e.g., heading style, output format).","symbol":"ConversionOptions","correct":"from html_to_markdown import ConversionOptions"},{"note":"Use to extract Markdown content along with structured metadata (headers, links, images).","symbol":"convert_with_metadata","correct":"from html_to_markdown import convert_with_metadata"}],"quickstart":{"code":"from html_to_markdown import convert, ConversionOptions\n\nhtml_content = \"\"\"\n<h1>Welcome</h1>\n<p>This is <strong>bold</strong> and <em>italic</em> text.</p>\n<ul>\n    <li>Item 1</li>\n    <li>Item 2</li>\n</ul>\n\"\"\"\n\n# Basic conversion\nmarkdown_output = convert(html_content)\nprint(f\"Default Markdown:\\n{markdown_output}\")\n\n# Conversion with options\noptions = ConversionOptions(\n    heading_style=\"atx\",\n    list_indent_width=2,\n    output_format=\"commonmark\"\n)\nformatted_markdown = convert(html_content, options)\nprint(f\"\\nFormatted Markdown (CommonMark):\\n{formatted_markdown}\")\n\n# Example for Djot output (another lightweight markup language)\ndjot_options = ConversionOptions(output_format=\"djot\")\ndjot_output = convert(html_content, djot_options)\nprint(f\"\\nDjot Output:\\n{djot_output}\")","lang":"python","description":"This quickstart demonstrates basic HTML to Markdown conversion using the `convert` function. It also shows how to apply `ConversionOptions` to customize the output format, such as specifying heading styles or using Djot instead of standard Markdown."},"warnings":[{"fix":"Consult the library's CHANGELOG for specific migration steps. Consider using `html_to_markdown.v1_compat` if direct migration is complex.","message":"Version 2.x introduced a complete rewrite with a Rust core, leading to significant performance gains but also breaking changes in the API. While a `v1_compat` module was provided, users upgrading from 1.x should review the changelog for necessary code adjustments.","severity":"breaking","affected_versions":"<2.0.0 to 2.x.x+"},{"fix":"Pre-process HTML with a tool like BeautifulSoup to simplify or strip unwanted elements before conversion. Review converted Markdown carefully for fidelity.","message":"Markdown is a less expressive format than HTML. Complex HTML structures, inline styles, and certain advanced tags (e.g., `<script>`, `<style>`) will be simplified or entirely removed during conversion, potentially leading to a loss of original formatting or functionality.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For critical code blocks or tables, manually verify the output. Consider using the 'Visitor Pattern' feature (`convert_with_visitor`) for fine-grained control over specific element conversions, especially for `pre` and `code` tags.","message":"Conversion of complex HTML tables (e.g., with `colspan`, `rowspan`, nested elements) and `<code>`/`<pre>` blocks might not perfectly retain original formatting or indentation in Markdown. This can lead to less readable or incorrectly structured output.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Use `from html_to_markdown import convert_with_metadata` and process the returned dictionary, e.g., `result = convert_with_metadata(html_content); markdown_content = result['content']; metadata = result['metadata']`.","message":"The primary `convert()` function only returns the Markdown string. If you need to extract structured metadata like titles, links, or headings from the HTML during conversion, you must use `convert_with_metadata()` which returns a dictionary including both content and metadata.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-14T00:00:00.000Z","next_check":"2026-07-13T00:00:00.000Z","problems":[]}