unicodedata2

17.0.1 · active · verified Mon Apr 13

unicodedata2 is a backport of the `unicodedata` module from the Python standard library, updated to include the latest Unicode versions. It provides access to the Unicode character database, enabling functions like querying character properties (name, category, numeric value) and normalizing Unicode strings. The current version is 17.0.1, and it typically releases new major versions to align with updates to the Unicode standard.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to import `unicodedata2` and use its core functions like `name()`, `category()`, and `normalize()` to inspect and process Unicode characters and strings. It highlights character properties and the importance of normalization for string comparison.

import unicodedata2

# Get character name
char = 'é'
name = unicodedata2.name(char)
print(f"Character: '{char}', Name: {name}")

# Get character category
category = unicodedata2.category(char)
print(f"Category for '{char}': {category}")

# Normalize a Unicode string
s1 = 'café'
s2 = 'cafe\u0301' # 'e' followed by combining acute accent

print(f"String 1: '{s1}', String 2: '{s2}'")
print(f"Are they equal? {s1 == s2}")

normalized_s1 = unicodedata2.normalize('NFC', s1)
normalized_s2 = unicodedata2.normalize('NFC', s2)

print(f"Normalized S1 (NFC): '{normalized_s1}'")
print(f"Normalized S2 (NFC): '{normalized_s2}'")
print(f"Are they equal after NFC? {normalized_s1 == normalized_s2}")

view raw JSON →