Markdownify
Markdownify is a Python library designed to convert HTML content into Markdown format. It is currently at version 1.2.2 and maintains an active development status with a healthy release cadence, frequently adding new features and addressing issues.
Common errors
-
ModuleNotFoundError: No module named 'markdownify'
cause The `markdownify` library has not been installed in your Python environment, or the environment you are running your code in does not have it installed.fixInstall the package using pip: `pip install markdownify` -
AttributeError: 'NoneType' object has no attribute 'strip'
cause This error occurs when `markdownify` is called with `None` instead of a string, as the library expects HTML content as a string input to its primary function. A similar `AttributeError` like `'int' object has no attribute 'strip'` would occur if an integer or another non-string type were passed.fixEnsure that the input to the `markdownify` function is always a string containing HTML content. For example, `markdownify('')` for an empty string or `markdownify(html_content_variable)` where `html_content_variable` holds a string. -
ValueError: The `strip` and `convert` options are mutually exclusive.
cause The `markdownify` library does not allow both the `strip` (blacklist of tags to remove) and `convert` (whitelist of tags to keep) options to be used simultaneously, as they define conflicting strategies for tag handling.fixChoose either the `strip` option or the `convert` option, but not both, when calling the `markdownify` function. For example, `markdownify(html, strip=['script'])` or `markdownify(html, convert=['b', 'i'])`.
Warnings
- breaking The interface for custom tag conversion functions (e.g., `convert_*()`) changed significantly in version 1.0.0. If you have custom conversion logic, it will need to be updated.
- gotcha The `strip` and `convert` options for `markdownify` are mutually exclusive. You cannot use both simultaneously.
- gotcha When customizing BeautifulSoup parser options via the `beautiful_soup_parser` argument (added in v1.2.0), string or list values are treated as 'features' (e.g., 'lxml', 'html5lib'), while dictionary values are treated as full keyword arguments for the BeautifulSoup constructor.
- gotcha By default, `markdownify` escapes asterisks (`*`) and underscores (`_`) that might be interpreted as Markdown formatting. If you want to disable this behavior, you need to explicitly set `escape_asterisks=False` or `escape_underscores=False`.
Install
-
pip install markdownify
Imports
- markdownify
from markdownify import markdownify as md
Quickstart
from markdownify import markdownify as md html_content = "<h1>Hello World</h1><p>This is <b>bold</b> and <em>italic</em> text with a <a href=\"http://example.com\">link</a>.</p>" markdown_output = md(html_content) print(markdown_output)