Unidecode

raw JSON →
1.4.0 verified Tue May 12 auth: no python install: verified

Unidecode is a Python library that provides ASCII transliterations of Unicode text. It converts non-ASCII Unicode characters into their closest ASCII approximations, which is useful for tasks like generating URL slugs or integrating with legacy systems. The current version is 1.4.0, with releases occurring as improvements to transliteration tables are made, rather than on a fixed schedule.

pip install unidecode
error ModuleNotFoundError: No module named 'unidecode'
cause The unidecode library has not been installed in the current Python environment.
fix
pip install unidecode
error TypeError: 'module' object is not callable
cause The unidecode module was imported directly, but the user attempted to call the module object as a function instead of calling the 'unidecode' function defined within it.
fix
import unidecode result = unidecode.unidecode("Héllø Wörld") # Alternatively, import the function directly: # from unidecode import unidecode # result = unidecode("Héllø Wörld")
error ImportError: cannot import name 'Unidecode' from 'unidecode'
cause The 'unidecode' function was imported with incorrect capitalization; the function name is lowercase 'unidecode'.
fix
from unidecode import unidecode text = unidecode("München")
error TypeError: expected string or bytes-like object, got int
cause The unidecode function received an argument that is an integer (or another non-string/bytes type), but it exclusively expects a string or bytes-like object.
fix
value = 123 text = unidecode(str(value)) # Ensure the input is a string
breaking The output of `unidecode()` is not guaranteed to be stable across different versions of the library. Improvements to transliteration tables can cause the ASCII approximation for certain Unicode characters to change in new releases.
fix If using `unidecode()` to generate persistent identifiers like URL slugs, either lock your `unidecode` dependency to a specific version or generate the slug once and store it in your database, rather than re-generating on the fly.
gotcha Unidecode performs a context-free, character-by-character mapping and is not language-specific. This means transliterations may not align with linguistic rules or cultural expectations for all languages (e.g., German umlauts are 'a', 'o', 'u' instead of 'ae', 'oe', 'ue'; East Asian languages may have simplified mappings).
fix For language-specific or more sophisticated transliterations (especially for Japanese, Chinese, Korean), consider using libraries designed for those specific languages or implement pre-processing rules before using `unidecode()`.
gotcha Unidecode requires a Python build with 'wide' Unicode characters (UCS-4 build) to correctly handle characters outside the Basic Multilingual Plane (BMP). 'Narrow' Python builds using surrogate pair encoding are not supported, which can lead to incorrect transliterations for mathematical symbols, emojis, etc.
fix Ensure your Python environment is built with 'wide' Unicode support (typically `sys.maxunicode > 0xffff`). This is usually the default for Python 3.7+ builds but can vary by system configuration.
gotcha The `unidecode` function expects a Unicode string (Python 3 `str`) as input. Passing byte data (e.g., from reading a file in binary mode) will result in a `TypeError` or incorrect output.
fix Always ensure your input is a properly decoded Unicode string before passing it to `unidecode()`. If reading from a file, open it in text mode with the correct encoding (e.g., `open('file.txt', 'r', encoding='utf-8')`).
gotcha The output of `unidecode` is a 'lossy' approximation. Since some characters map to `''` (empty string) or generic characters (like `?`), and due to its non-linguistic approach, the transliterated output should not be directly exposed to users without careful consideration, as it may be perceived as offensive or simply incorrect.
fix Use `unidecode` primarily for internal system identifiers, search indexing, or compatibility with ASCII-only systems, not as a user-facing display mechanism. Always consider the context and potential user perception of the transliterated text.
python os / libc status wheel install import disk
3.10 alpine (musl) wheel - 0.01s 19.9M
3.10 alpine (musl) - - 0.01s 19.9M
3.10 slim (glibc) wheel 1.6s 0.00s 20M
3.10 slim (glibc) - - 0.00s 20M
3.11 alpine (musl) wheel - 0.02s 21.7M
3.11 alpine (musl) - - 0.02s 21.7M
3.11 slim (glibc) wheel 1.7s 0.01s 22M
3.11 slim (glibc) - - 0.01s 22M
3.12 alpine (musl) wheel - 0.01s 13.6M
3.12 alpine (musl) - - 0.01s 13.6M
3.12 slim (glibc) wheel 1.5s 0.01s 14M
3.12 slim (glibc) - - 0.01s 14M
3.13 alpine (musl) wheel - 0.01s 13.4M
3.13 alpine (musl) - - 0.01s 13.3M
3.13 slim (glibc) wheel 1.6s 0.01s 14M
3.13 slim (glibc) - - 0.01s 14M
3.9 alpine (musl) wheel - 0.01s 19.4M
3.9 alpine (musl) - - 0.01s 19.4M
3.9 slim (glibc) wheel 1.9s 0.01s 20M
3.9 slim (glibc) - - 0.01s 20M

Demonstrates how to import and use the `unidecode` function for basic transliteration and a common use case like generating URL slugs.

from unidecode import unidecode

# Basic transliteration
text_unicode = 'Łódź, 北京, Español'
text_ascii = unidecode(text_unicode)
print(f"Original: {text_unicode}")
print(f"Transliterated: {text_ascii}")

# Example for URL slug generation (common use case)
article_title = 'The "Café" where you can find "Piñatas"!'
slug = unidecode(article_title).replace(' ', '-').lower()
print(f"\nURL Slug: {slug}")