text-unidecode

raw JSON →
1.3 verified Tue May 12 auth: no python install: verified quickstart: verified

text-unidecode is a basic and lightweight Python port of the Perl Text::Unidecode library. It converts Unicode text into a 'good enough' 7-bit ASCII representation by performing context-free, character-by-character transliteration. The current version is 1.3, released in 2019, indicating a stable but infrequently updated project.

pip install text-unidecode
error ModuleNotFoundError: No module named 'unidecode'
cause The `text-unidecode` Python package, which provides the `unidecode` module, has not been installed in the current Python environment.
fix
pip install text-unidecode
error TypeError: 'module' object is not callable
cause After `import unidecode`, the user is trying to call the entire module as a function instead of accessing the specific `unidecode` function within it.
fix
Use unidecode.unidecode('your text') or import the function directly with from unidecode import unidecode and then call unidecode('your text').
error ModuleNotFoundError: No module named 'text_unidecode'
cause The user is attempting to import a module named `text_unidecode`, but the actual module name provided by the `text-unidecode` package is `unidecode`.
fix
Use the correct module name in your import statement: from unidecode import unidecode or import unidecode.
error NameError: name 'unidecode' is not defined
cause The `unidecode` function was called without being imported first from the `unidecode` module.
fix
Import the function before using it: from unidecode import unidecode
gotcha text-unidecode performs basic, context-free character-by-character transliteration. This can lead to inaccurate or unexpected results for languages with complex writing systems (e.g., Japanese, Thai) or for characters with language-specific transliteration conventions (e.g., German umlauts, which are mapped to 'A', 'O', 'U' instead of 'Ae', 'Oe', 'Ue'). It's not designed for high-quality, language-aware transliteration.
fix For critical applications, evaluate the output carefully or pre-process strings for specific language rules. For better transliteration quality, consider language-specific libraries or the alternative 'unidecode' package if its license is acceptable.
breaking The transliteration tables within the library can be updated in future versions to improve mappings or fix inconsistencies. This means that the ASCII output for a given Unicode input might change between versions, potentially breaking systems that rely on consistent, stable ASCII representations (e.g., for generating unique URL slugs or identifiers).
fix If output stability is crucial (e.g., for URL slugs), either store the generated ASCII string in a database or pin the 'text-unidecode' dependency to a specific version to prevent unexpected changes.
gotcha The `text-unidecode` project's own documentation suggests an alternative library, `unidecode`, which is also a port of Text::Unidecode. It states that `unidecode` offers 'better memory usage and better transliteration quality' but is GPL-licensed. Users should evaluate `unidecode` if their project is compatible with GPL and requires potentially superior results.
fix Review project licensing requirements and compare `text-unidecode` with the `unidecode` library to choose the best fit for quality and license compatibility.
python os / libc status wheel install import disk
3.10 alpine (musl) wheel - 0.03s 18.1M
3.10 alpine (musl) - - 0.04s 18.1M
3.10 slim (glibc) wheel 1.5s 0.02s 19M
3.10 slim (glibc) - - 0.02s 19M
3.11 alpine (musl) wheel - 0.06s 19.9M
3.11 alpine (musl) - - 0.07s 19.9M
3.11 slim (glibc) wheel 1.6s 0.05s 20M
3.11 slim (glibc) - - 0.06s 20M
3.12 alpine (musl) wheel - 0.04s 11.8M
3.12 alpine (musl) - - 0.05s 11.8M
3.12 slim (glibc) wheel 1.4s 0.05s 12M
3.12 slim (glibc) - - 0.05s 12M
3.13 alpine (musl) wheel - 0.04s 11.5M
3.13 alpine (musl) - - 0.05s 11.4M
3.13 slim (glibc) wheel 1.5s 0.04s 12M
3.13 slim (glibc) - - 0.05s 12M
3.9 alpine (musl) wheel - 0.04s 17.6M
3.9 alpine (musl) - - 0.04s 17.6M
3.9 slim (glibc) wheel 1.7s 0.03s 18M
3.9 slim (glibc) - - 0.04s 18M

Demonstrates how to import the `unidecode` function and use it to transliterate a Unicode string into a basic ASCII equivalent.

from text_unidecode import unidecode

unicode_text = "Héllø Wörld! Како си? 北亰"
ascii_text = unidecode(unicode_text)

print(f"Original: {unicode_text}")
print(f"ASCII: {ascii_text}")

# Example with specific characters
unicode_german = "Äpfel, Öfen, Übermut"
ascii_german = unidecode(unicode_german)
print(f"German Original: {unicode_german}")
print(f"German ASCII: {ascii_german}")