Bleach

6.3.0 · deprecated · verified Sat Mar 28

Bleach is an allowed-list-based HTML sanitizing library for Python (current version 6.3.0) that escapes or strips markup and attributes based on a configurable safelist. It also provides functionality to safely linkify text, including setting `rel` attributes. Designed for sanitizing text from untrusted sources, Bleach is built upon html5lib, making it robust against malformed HTML fragments. Note that the project was deprecated in January 2023, citing upstream dependency `html5lib`'s lack of active maintenance, and is now in a minimum-maintenance mode, with new projects discouraged.

Warnings

Install

Imports

Quickstart

This example demonstrates basic HTML sanitization using `bleach.clean()` and URL linkification with `bleach.linkify()`. It also shows how to use a `Cleaner` instance for more advanced or repeated sanitization tasks with custom allowed tags and attributes.

import bleach

# Sanitize HTML
html_input = 'An <script>alert("evil")</script> example with <b>bold</b> text.'
cleaned_html = bleach.clean(
    html_input,
    tags={'b', 'i', 'strong', 'em', 'a', 'p', 'br'},
    attributes={'a': ['href', 'title']}
)
print(f"Cleaned HTML: {cleaned_html}")

# Linkify text
text_with_urls = 'Check out example.com or mailto:user@example.com'
linkified_text = bleach.linkify(text_with_urls)
print(f"Linkified text: {linkified_text}")

# Using a Cleaner instance for performance/configurability
from bleach.sanitizer import Cleaner
my_cleaner = Cleaner(
    tags={'p', 'span'},
    attributes={'span': ['style']},
    css_sanitizer=None # Requires 'bleach[css]' for robust CSS sanitization
)
complex_html = '<p style="color: red;">Safe paragraph</p><img src="x.jpg">'
cleaned_complex = my_cleaner.clean(complex_html)
print(f"Cleaned with Cleaner: {cleaned_complex}")

view raw JSON →