unicodedataplus
raw JSON → 16.0.0.post1 verified Mon Apr 27 auth: no python
An enhanced drop-in replacement for Python's unicodedata module, providing additional Unicode properties like script extensions, Indic positional/syllabic categories, property value aliases, and up-to-date data (current version 16.0.0.post1, Unicode 16.0.0). Released on PyPI with occasional updates aligned with new Unicode versions.
pip install unicodedataplus Common errors
error ModuleNotFoundError: No module named 'unicodedataplus' ↓
cause Package not installed.
fix
Run
pip install unicodedataplus in your environment. error AttributeError: module 'unicodedataplus' has no attribute 'script_extensions' ↓
cause Using an older version of unicodedataplus that doesn't have that attribute.
fix
Upgrade to version 12.1.0 or later:
pip install --upgrade unicodedataplus. error ValueError: character must be a string of length 1 ↓
cause Passed a string with more than one character (e.g., `unicodedataplus.name('AB')`).
fix
Pass a single-character string:
unicodedataplus.name('A'). Warnings
breaking unicodedataplus requires Python 3.6+; Python 2 is not supported (dropped in version 12+). ↓
fix Use Python 3.6 or later.
gotcha The module is a drop-in replacement for unicodedata, but some properties only exist in unicodedataplus (e.g., script_extensions). If you rely on stdlib unicodedata and switch, your code may break if those extra attributes are missing in stdlib. ↓
fix Ensure you import unicodedataplus and don't assume stdlib unicodedata has the same methods.
deprecated The function `script_extensions` and property `indic_syllabic_category`, `indic_positional_category` are available but not yet finalized in Unicode standard; API may change. ↓
fix Check the documentation for the latest API. Use with caution if stability is needed.
gotcha Installation may fail on systems without a working C compiler (e.g., Windows with missing VC++ build tools) as the package compiles a C extension. Use a binary wheel if available. ↓
fix Prefer installing from precompiled wheels via pip (e.g., `pip install unicodedataplus --only-binary=:all:`). If unavailable, install a C compiler (e.g., Microsoft Build Tools for Windows).
Imports
- unicodedataplus
import unicodedataplus - lookup
unicodedataplus.lookup(name) - name
unicodedataplus.name(chr, default=None) - decimal
unicodedataplus.decimal(chr, default=None) - category
unicodedataplus.category(chr) - bidirectional
unicodedataplus.bidirectional(chr) - combining
unicodedataplus.combining(chr) - east_asian_width
unicodedataplus.east_asian_width(chr) - mirrored
unicodedataplus.mirrored(chr) - decomposition
unicodedataplus.decomposition(chr) - normalize
unicodedataplus.normalize(form, unistr) - ucd_3_2_0
unicodedataplus.ucd_3_2_0 - unidata_version
unicodedataplus.unidata_version
Quickstart
import unicodedataplus as ud
# Check the Unicode version
print(ud.unidata_version)
# Get the name of a character
print(ud.name('A')) # Output: LATIN CAPITAL LETTER A
# Get script extension for a character
# Note: script extensions are a list of scripts (property is 'Script_Extensions')
print(ud.script_extensions('A')) # Output: ['Latin']
# Look up a character by alias name
print(ud.lookup('LATIN CAPITAL LETTER A')) # Output: A
# Check a property (e.g., Indic syllabic category for Devanagari vowel sign)
import sys
print(ud.indic_syllabic_category('\u0901'))