Confusable Homoglyphs

3.3.1 · active · verified Sun Apr 12

Confusable Homoglyphs (version 3.3.1) is a Python library designed to detect and prevent homograph attacks by identifying visually similar Unicode characters (homoglyphs). It provides functionality to check for mixed-script strings and characters that might be confused with others from a preferred set of Unicode blocks. The library is actively maintained with regular updates to its underlying Unicode data and Python compatibility, aiming to safeguard against visual deception in text.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `is_dangerous` to detect strings with mixed scripts and characters that could be confused with a preferred alias (e.g., 'latin'), and `is_confusable` to find all confusable characters within a string.

from confusable_homoglyphs import confusables

# Check if a string contains mixed scripts and dangerous confusable characters
text_dangerous = "ΑlaskaJazz" # First char is Greek Alpha
text_safe = "AlaskaJazz" # All Latin

is_dangerous_result = confusables.is_dangerous(text_dangerous, preferred_aliases=['latin'])
print(f"Is '{text_dangerous}' dangerous? {is_dangerous_result}")

is_dangerous_result_safe = confusables.is_dangerous(text_safe, preferred_aliases=['latin'])
print(f"Is '{text_safe}' dangerous? {is_dangerous_result_safe}")

# Check for specific confusable characters within a string
text_confusable = "microsоft"

is_confusable_result = confusables.is_confusable(text_confusable, greedy=True, preferred_aliases=['latin'])
print(f"Is '{text_confusable}' confusable? {is_confusable_result}")

text_not_confusable = "microsoft"
is_confusable_result_safe = confusables.is_confusable(text_not_confusable, greedy=True, preferred_aliases=['latin'])
print(f"Is '{text_not_confusable}' confusable? {is_confusable_result_safe}")

view raw JSON →