unicodedataplus

raw JSON →
16.0.0.post1 verified Mon Apr 27 auth: no python

An enhanced drop-in replacement for Python's unicodedata module, providing additional Unicode properties like script extensions, Indic positional/syllabic categories, property value aliases, and up-to-date data (current version 16.0.0.post1, Unicode 16.0.0). Released on PyPI with occasional updates aligned with new Unicode versions.

pip install unicodedataplus
error ModuleNotFoundError: No module named 'unicodedataplus'
cause Package not installed.
fix
Run pip install unicodedataplus in your environment.
error AttributeError: module 'unicodedataplus' has no attribute 'script_extensions'
cause Using an older version of unicodedataplus that doesn't have that attribute.
fix
Upgrade to version 12.1.0 or later: pip install --upgrade unicodedataplus.
error ValueError: character must be a string of length 1
cause Passed a string with more than one character (e.g., `unicodedataplus.name('AB')`).
fix
Pass a single-character string: unicodedataplus.name('A').
breaking unicodedataplus requires Python 3.6+; Python 2 is not supported (dropped in version 12+).
fix Use Python 3.6 or later.
gotcha The module is a drop-in replacement for unicodedata, but some properties only exist in unicodedataplus (e.g., script_extensions). If you rely on stdlib unicodedata and switch, your code may break if those extra attributes are missing in stdlib.
fix Ensure you import unicodedataplus and don't assume stdlib unicodedata has the same methods.
deprecated The function `script_extensions` and property `indic_syllabic_category`, `indic_positional_category` are available but not yet finalized in Unicode standard; API may change.
fix Check the documentation for the latest API. Use with caution if stability is needed.
gotcha Installation may fail on systems without a working C compiler (e.g., Windows with missing VC++ build tools) as the package compiles a C extension. Use a binary wheel if available.
fix Prefer installing from precompiled wheels via pip (e.g., `pip install unicodedataplus --only-binary=:all:`). If unavailable, install a C compiler (e.g., Microsoft Build Tools for Windows).

Basic usage: import and use as a drop-in for unicodedata, plus extended property access.

import unicodedataplus as ud

# Check the Unicode version
print(ud.unidata_version)

# Get the name of a character
print(ud.name('A'))  # Output: LATIN CAPITAL LETTER A

# Get script extension for a character
# Note: script extensions are a list of scripts (property is 'Script_Extensions')
print(ud.script_extensions('A'))  # Output: ['Latin']

# Look up a character by alias name
print(ud.lookup('LATIN CAPITAL LETTER A'))  # Output: A

# Check a property (e.g., Indic syllabic category for Devanagari vowel sign)
import sys
print(ud.indic_syllabic_category('\u0901'))