nh3 HTML Sanitizer
nh3 is a Python binding to the Ammonia HTML sanitizer Rust crate, providing fast and configurable whitelist-based HTML sanitization. It is notably faster than pure-Python alternatives like `Bleach`, which it largely replaces since `html5lib` became unmaintained. The library is actively maintained, with its current version 0.3.4, and receives regular updates to both the Python bindings and its underlying Rust components.
Warnings
- breaking nh3 is a modern replacement for the deprecated `Bleach` library. Users migrating from `Bleach` should note that while APIs are similar, configuration and underlying behavior may differ, requiring careful review of sanitization rules.
- gotcha The default `nh3.clean()` function allows a relatively broad set of HTML tags (around 75) and attributes. This might be too permissive for many applications and could inadvertently allow potentially unsafe content if not explicitly restricted.
- breaking As `nh3` is a binding to the Rust `ammonia` crate, security vulnerabilities in the underlying Rust library can affect it. Regular updates are necessary to incorporate critical fixes. For example, recent updates addressed `RUSTSEC-2025-0071`.
- gotcha When integrating with web frameworks like Django, simply calling `nh3.clean()` in templates or view logic may not be sufficient. Unsanitized data might still be saved to the database if sanitization isn't applied at the input (e.g., form field or model field) level.
Install
-
pip install nh3
Imports
- clean
from nh3 import clean
- Cleaner
from nh3 import Cleaner
Quickstart
import nh3
# Basic sanitization
unclean_html = "<b><img src='' onerror='alert(\'hax\')'>XSS?</b><script>alert('malicious')</script><p>Hello</p>"
cleaned_html = nh3.clean(unclean_html)
print(f"Cleaned HTML: {cleaned_html}")
# Customizing allowed tags
custom_cleaned = nh3.clean(unclean_html, tags={"b", "p"})
print(f"Custom cleaned (b, p only): {custom_cleaned}")
# Using Cleaner for reusable configuration
my_cleaner = nh3.Cleaner(
tags={"b", "i", "p"},
attributes={
"a": {"href"},
"img": {"src"}
},
url_schemes={"http", "https"}
)
reusable_cleaned = my_cleaner.clean("<a href='javascript:alert(\"xss\")'>Click</a><b>Bold</b><i>Italic</i><script>evil</script>")
print(f"Reusable cleaner output: {reusable_cleaned}")