pyunormalize Unicode Normalization Library

17.0.0 · active · verified Thu Apr 09

pyunormalize is a pure-Python library for Unicode normalization (NFC, NFD, NFKC, NFKD) that operates independently of Python's built-in Unicode database. It uses its own dedicated data, ensuring strict conformance to the latest Unicode Standard (currently v17.0.0). New major versions are typically released to align with updates to the Unicode Standard.

Warnings

Install

Imports

Quickstart

This example demonstrates how to import and use the four primary Unicode normalization forms (NFC, NFD, NFKC, NFKD) provided by pyunormalize, and how to retrieve the Unicode Character Database (UCD) version in use.

from pyunormalize import NFC, NFD, NFKC, NFKD, UCD_VERSION

# Example string with accented characters
text = "élève"

# Normalize to different forms
nfc_text = NFC(text)
nfd_text = NFD(text)
nfkc_text = NFKC(text)
nfkd_text = NFKD(text)

print(f"Original: {text}")
print(f"NFC: {nfc_text}")
print(f"NFD: {nfd_text}")
print(f"NFKC: {nfkc_text}")
print(f"NFKD: {nfkd_text}")
print(f"Unicode database version: {UCD_VERSION}")

view raw JSON →