text-unidecode
text-unidecode is a basic and lightweight Python port of the Perl Text::Unidecode library. It converts Unicode text into a 'good enough' 7-bit ASCII representation by performing context-free, character-by-character transliteration. The current version is 1.3, released in 2019, indicating a stable but infrequently updated project.
Warnings
- gotcha text-unidecode performs basic, context-free character-by-character transliteration. This can lead to inaccurate or unexpected results for languages with complex writing systems (e.g., Japanese, Thai) or for characters with language-specific transliteration conventions (e.g., German umlauts, which are mapped to 'A', 'O', 'U' instead of 'Ae', 'Oe', 'Ue'). It's not designed for high-quality, language-aware transliteration.
- breaking The transliteration tables within the library can be updated in future versions to improve mappings or fix inconsistencies. This means that the ASCII output for a given Unicode input might change between versions, potentially breaking systems that rely on consistent, stable ASCII representations (e.g., for generating unique URL slugs or identifiers).
- gotcha The `text-unidecode` project's own documentation suggests an alternative library, `unidecode`, which is also a port of Text::Unidecode. It states that `unidecode` offers 'better memory usage and better transliteration quality' but is GPL-licensed. Users should evaluate `unidecode` if their project is compatible with GPL and requires potentially superior results.
Install
-
pip install text-unidecode
Imports
- unidecode
from text_unidecode import unidecode
Quickstart
from text_unidecode import unidecode
unicode_text = "Héllø Wörld! Како си? 北亰"
ascii_text = unidecode(unicode_text)
print(f"Original: {unicode_text}")
print(f"ASCII: {ascii_text}")
# Example with specific characters
unicode_german = "Äpfel, Öfen, Übermut"
ascii_german = unidecode(unicode_german)
print(f"German Original: {unicode_german}")
print(f"German ASCII: {ascii_german}")