{"id":2249,"library":"pyunormalize","title":"pyunormalize Unicode Normalization Library","description":"pyunormalize is a pure-Python library for Unicode normalization (NFC, NFD, NFKC, NFKD) that operates independently of Python's built-in Unicode database. It uses its own dedicated data, ensuring strict conformance to the latest Unicode Standard (currently v17.0.0). New major versions are typically released to align with updates to the Unicode Standard.","status":"active","version":"17.0.0","language":"en","source_language":"en","source_url":"https://github.com/mlodewijck/pyunormalize/","tags":["unicode","normalization","nfc","nfd","nfkc","nfkd","text-processing","internationalization"],"install":[{"cmd":"pip install pyunormalize","lang":"bash","label":"Install stable version"}],"dependencies":[],"imports":[{"symbol":"NFC","correct":"from pyunormalize import NFC"},{"symbol":"NFD","correct":"from pyunormalize import NFD"},{"symbol":"NFKC","correct":"from pyunormalize import NFKC"},{"symbol":"NFKD","correct":"from pyunormalize import NFKD"},{"note":"UCD_VERSION is a direct export for convenience.","wrong":"import pyunormalize; pyunormalize.UCD_VERSION","symbol":"UCD_VERSION","correct":"from pyunormalize import UCD_VERSION"},{"note":"pyunormalize is designed to be independent of Python's built-in `unicodedata` module and its potentially older Unicode database. Use the specific NFC/NFD/NFKC/NFKD functions from pyunormalize.","wrong":"import unicodedata; unicodedata.normalize('NFC', text)","symbol":"normalize (general function)","correct":"from pyunormalize import NFC, NFD, NFKC, NFKD"}],"quickstart":{"code":"from pyunormalize import NFC, NFD, NFKC, NFKD, UCD_VERSION\n\n# Example string with accented characters\ntext = \"élève\"\n\n# Normalize to different forms\nnfc_text = NFC(text)\nnfd_text = NFD(text)\nnfkc_text = NFKC(text)\nnfkd_text = NFKD(text)\n\nprint(f\"Original: {text}\")\nprint(f\"NFC: {nfc_text}\")\nprint(f\"NFD: {nfd_text}\")\nprint(f\"NFKC: {nfkc_text}\")\nprint(f\"NFKD: {nfkd_text}\")\nprint(f\"Unicode database version: {UCD_VERSION}\")","lang":"python","description":"This example demonstrates how to import and use the four primary Unicode normalization forms (NFC, NFD, NFKC, NFKD) provided by pyunormalize, and how to retrieve the Unicode Character Database (UCD) version in use."},"warnings":[{"fix":"Upgrade Python environment to 3.8+ or pin `pyunormalize<17.0.0`.","message":"Version 17.0.0 dropped official support for Python 3.6 and 3.7. Users on these Python versions should upgrade to Python 3.8 or newer, or stick to an older version of pyunormalize (e.g., < 17.0.0).","severity":"breaking","affected_versions":"17.0.0+"},{"fix":"Understand that `pyunormalize` provides strict, self-contained Unicode version support. Use `pyunormalize.UCD_VERSION` to verify the active Unicode version.","message":"pyunormalize is designed to use its own Unicode Character Database (UCD), making it independent of the UCD version bundled with your Python interpreter's `unicodedata` module. This is its core feature, but users migrating from `unicodedata` should be aware that results might differ if their Python's `unicodedata` is significantly older or newer than pyunormalize's UCD version.","severity":"gotcha","affected_versions":"All"},{"fix":"Thoroughly validate and sanitize all user inputs. Do not rely solely on normalization for security checks; consider canonical forms (NFC, NFD) for strict equivalence if possible, and be mindful of compatibility equivalences (NFKC, NFKD) if they are used in security-sensitive comparisons.","message":"While pyunormalize correctly implements Unicode normalization, be aware of broader security implications of Unicode equivalence. Characters that appear identical after normalization (e.g., compatibility equivalences in NFKC/NFKD) can be represented by different underlying code points, which might be exploited in security contexts (e.g., path traversal, input validation, string comparisons in authentication).","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}