AnyASCII
anyascii is a Python library that provides fast and accurate Unicode to ASCII transliteration. It converts any Unicode string into an ASCII representation, making it suitable for filenames, URLs, or other contexts where only ASCII characters are permitted. The current version is 0.3.3, and it maintains a relatively stable release cadence with updates for data improvements or internal packaging.
Warnings
- breaking Python 2 support was dropped in version 0.2.0. If your project requires Python 2 compatibility, you must pin the `anyascii` version to `<0.2` (e.g., `anyascii==0.1.7`).
- gotcha Transliteration is an inherently lossy process. While `anyascii` provides a robust ASCII representation, it may not perfectly preserve all semantic or linguistic nuances of the original Unicode string. Users should be aware that the output is a best-effort ASCII approximation.
- gotcha `anyascii` focuses purely on Unicode to ASCII transliteration. It does not perform other text normalization tasks such as lowercasing, stripping extra whitespace, or handling character compositions beyond what's necessary for direct ASCII mapping. For broader text cleaning, combine it with other libraries.
Install
-
pip install anyascii
Imports
- anyascii
from anyascii import anyascii
Quickstart
from anyascii import anyascii
# Example 1: Basic transliteration
text1 = '你好,世界'
result1 = anyascii(text1)
print(f"'{text1}' -> '{result1}'")
# Example 2: European characters
text2 = 'Hello, world! Pýthön æøåß®©'
result2 = anyascii(text2)
print(f"'{text2}' -> '{result2}'")
# Example 3: Mixed script
text3 = 'Ελληνικά, Русский, 日本語'
result3 = anyascii(text3)
print(f"'{text3}' -> '{result3}'")