Fold-to-ASCII
A Python port of the Apache Lucene ASCII Folding Filter, this library converts alphabetic, numeric, and symbolic Unicode characters outside the Basic Latin block into their ASCII equivalents if they exist. It's currently at version 1.0.2.post1 and is a stable, low-cadence utility library focused on character folding rather than full transliteration.
Common errors
-
NameError: name 'fold' is not defined
cause The `fold` function was not imported into the current scope.fixAdd `from fold_to_ascii import fold` at the top of your script or module. -
TypeError: fold() takes exactly one argument (0 given)
cause The `fold` function was called without passing any string argument.fixEnsure you pass a string to the function, e.g., `fold('your text here')`. -
AttributeError: 'module' object has no attribute 'fold'
cause You likely imported the module as `import fold_to_ascii` and then tried to call `fold()` directly, or did `from fold_to_ascii import *` and `fold` wasn't included (unlikely for such a small library).fixUse `from fold_to_ascii import fold` and then call `fold('text')`, or if you prefer `import fold_to_ascii`, call it as `fold_to_ascii.fold('text')`.
Warnings
- gotcha Characters without a direct ASCII equivalent (e.g., CJK characters, emojis, or many symbols) are converted to '?' (question mark) characters by default, as the Lucene folding filter does not define mappings for them.
- gotcha This library performs ASCII 'folding', which is a specific type of character mapping. It is not a general-purpose 'transliteration' library that attempts to convert characters into phonetic or semantic equivalents across different languages. It strictly follows the rules of the Apache Lucene ASCII Folding Filter.
Install
-
pip install fold-to-ascii
Imports
- fold
from fold_to_ascii import fold
Quickstart
from fold_to_ascii import fold
# Example 1: Characters with direct ASCII equivalents
text_with_accents = "Crème brûlée is delicious!"
ascii_text = fold(text_with_accents)
print(f"Original: {text_with_accents}")
print(f"Folded: {ascii_text}")
# Example 2: Characters without direct ASCII equivalents
text_with_non_folding = "Hello, 你好 👋 world!"
ascii_text_non_folding = fold(text_with_non_folding)
print(f"Original: {text_with_non_folding}")
print(f"Folded: {ascii_text_non_folding}")