Tangled Up in Unicode
Tangled Up in Unicode is a Python library, currently at version 0.2.0, that provides access to the Unicode Character Database (UCD). It serves as an alternative to Python's standard `unicodedata` module, offering the latest UCD versions and extended character properties. Releases are typically aligned with new Unicode Standard versions.
Warnings
- gotcha Installing `tangled-up-in-unicode` can result in a very large `site-packages` directory (up to 1.8GB or more in a `venv`) due to the size of generated `.pyc` files from large internal data dictionaries.
- gotcha Prior to version 0.0.7, querying for a script that was not in the lookup table would raise an `IndexError`. From version 0.0.7 onwards, this behavior changed to return the string 'Unknown' instead.
Install
-
pip install tangled-up-in-unicode
Imports
- tangled_up_in_unicode
import tangled_up_in_unicode as unicodedata
Quickstart
import tangled_up_in_unicode as unicodedata
char = '$'
print(f"--- Properties for '{char}' ---")
print(f"Name: {unicodedata.name(char)}")
print(f"Category (Short): {unicodedata.category(char)}")
print(f"Bidirectional (Short): {unicodedata.bidirectional(char)}")
# This library provides more properties and aliases than standard unicodedata
print(f"Script (Long): {unicodedata.script(char, long=True)}")
print(f"Block (Long): {unicodedata.block(char, long=True)}")
print(f"UCD Version: {unicodedata.unidata_version}")