{"id":5159,"library":"confusables","title":"Confusables","description":"Confusables is a Python package designed for analyzing and matching words that appear similar but use different Unicode characters. It leverages the official Unicode confusable characters list to detect homoglyphs, which can be useful for applications like identifying malicious fake website names, normalizing text data, or bypassing profanity filters. The library is currently at version 1.2.0 and receives updates as needed, particularly for Unicode character set changes.","status":"active","version":"1.2.0","language":"en","source_language":"en","source_url":"https://github.com/woodgern/confusables","tags":["unicode","security","fuzzy matching","confusable characters","homoglyphs","text processing"],"install":[{"cmd":"pip install confusables","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"Runtime dependency","package":"python","optional":false},{"reason":"Build dependency, typically handled by pip","package":"python-setuptools","optional":true}],"imports":[{"symbol":"is_confusable","correct":"from confusables import is_confusable"},{"symbol":"confusable_characters","correct":"from confusables import confusable_characters"},{"symbol":"confusable_regex","correct":"from confusables import confusable_regex"},{"symbol":"normalize","correct":"from confusables import normalize"}],"quickstart":{"code":"from confusables import is_confusable, confusable_regex, normalize\n\n# Check if two strings are confusable\nprint(f\"'rover' vs 'ƦỏV3ℛ': {is_confusable('rover', 'ƦỏV3ℛ')}\")\n\n# Generate a regex for confusable characters\nregex_pattern = confusable_regex('admin', include_character_padding=True)\nprint(f\"Regex for 'admin': {regex_pattern}\")\n\n# Normalize a string to its confusable ASCII counterparts\nnormalized_forms = normalize('micrоsoft', prioritize_alpha=True)\nprint(f\"Normalized forms of 'micrоsoft': {normalized_forms}\")","lang":"python","description":"This quickstart demonstrates the core functionalities: checking if two strings are confusable, generating a regular expression to match confusable variations of a string, and normalizing a string to a list of possible \"normal forms\" with ASCII priority."},"warnings":[{"fix":"Remove the `match_subword` argument from calls to `confusable_regex()`. The function's behavior now automatically includes subword matching.","message":"The `match_subword` option was removed from the `confusable_regex()` function in version 1.0.0. It now behaves as if `match_subword` is always true.","severity":"breaking","affected_versions":">=1.0.0"},{"fix":"Review applications relying on specific confusable character sets. The updated Unicode data may result in more comprehensive (or different) matches.","message":"Version 1.0.0 updated to Unicode Confusables version 12.1.0, and now matches all Unicode characters with themselves. This may change the set of characters considered confusable compared to older versions.","severity":"breaking","affected_versions":">=1.0.0"},{"fix":"Be aware that confusable matching is not an exact science. Regularly test critical use cases with new library versions if strict, consistent matching across versions is required.","message":"The definition of 'confusable' is intentionally loose and may become more or less strict in future versions, as it deals with human interpretation. This could subtly alter matching behavior between releases.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}