Bleach
Bleach is an allowed-list-based HTML sanitizing library for Python (current version 6.3.0) that escapes or strips markup and attributes based on a configurable safelist. It also provides functionality to safely linkify text, including setting `rel` attributes. Designed for sanitizing text from untrusted sources, Bleach is built upon html5lib, making it robust against malformed HTML fragments. Note that the project was deprecated in January 2023, citing upstream dependency `html5lib`'s lack of active maintenance, and is now in a minimum-maintenance mode, with new projects discouraged.
Warnings
- deprecated The Bleach project was officially deprecated on January 23, 2023, due to its reliance on the unmaintained `html5lib` library. It is now in a minimum-maintenance mode, and new projects are explicitly discouraged from using it.
- breaking For `bleach.clean()`, `bleach.sanitizer.Cleaner`, `bleach.html5lib_shim.BleachHTMLParser`, `tags` and `protocols` arguments changed from lists to sets. Similarly, for `bleach.linkify()` and `bleach.linkifier.Linker`, `skip_tags` and `recognized_tags` arguments changed from lists to sets.
- breaking CSS sanitization behavior within `style` attributes was completely rewritten. If you were sanitizing CSS, you will need to update your code. This functionality now requires installing `bleach` with the `[css]` extra: `pip install 'bleach[css]'`.
- breaking Attribute callables (functions passed to `attributes` argument) for `clean()` and `linkify()` changed their signature. They now expect three arguments: `tag`, `attribute_name`, and `attribute_value`, rather than just `attribute_name` and `attribute_value`.
- gotcha The output of `bleach.clean()` is intended for use specifically in an HTML *content* context (e.g., `<div>{{ cleaned_text }}</div>`). It is NOT safe for use in HTML attributes, CSS, JavaScript, JSON, or other contexts without further appropriate escaping (e.g., using a template engine's `escape` function).
- breaking Bleach dropped support for older Python versions: 3.6 (v6.0.0), 3.7 (v6.1.0), 3.8 (v6.2.0), and 3.9 (v6.3.0). The current version (6.3.0) requires Python >=3.10.
Install
-
pip install bleach -
pip install 'bleach[css]'
Imports
- bleach
import bleach
- clean
bleach.clean(...)
- linkify
bleach.linkify(...)
- Cleaner
from bleach.sanitizer import Cleaner
Quickstart
import bleach
# Sanitize HTML
html_input = 'An <script>alert("evil")</script> example with <b>bold</b> text.'
cleaned_html = bleach.clean(
html_input,
tags={'b', 'i', 'strong', 'em', 'a', 'p', 'br'},
attributes={'a': ['href', 'title']}
)
print(f"Cleaned HTML: {cleaned_html}")
# Linkify text
text_with_urls = 'Check out example.com or mailto:user@example.com'
linkified_text = bleach.linkify(text_with_urls)
print(f"Linkified text: {linkified_text}")
# Using a Cleaner instance for performance/configurability
from bleach.sanitizer import Cleaner
my_cleaner = Cleaner(
tags={'p', 'span'},
attributes={'span': ['style']},
css_sanitizer=None # Requires 'bleach[css]' for robust CSS sanitization
)
complex_html = '<p style="color: red;">Safe paragraph</p><img src="x.jpg">'
cleaned_complex = my_cleaner.clean(complex_html)
print(f"Cleaned with Cleaner: {cleaned_complex}")