uroman - Universal Romanizer
uroman is a universal romanizer designed to convert text in any script to the standard Latin alphabet. Version 1.3.1.1 is the current stable release. Starting with v1.3.1, the library underwent a significant rewrite from Perl to Python, bringing improved support for various languages including Coptic, Thai, Khmer, and Tibetan. Releases are made periodically to enhance language support and features.
Warnings
- breaking Version 1.3.1 marked a complete rewrite of the library from Perl to Python. Code written for previous versions (pre-1.3.1) is incompatible and will not work with the Python version.
- gotcha The uroman Python library (v1.3.1 and later) requires Python 3.10 or newer. Attempting to install or run it with older Python versions will result in installation failures or runtime errors.
Install
-
pip install uroman
Imports
- romanize
import uroman romanized_text = uroman.romanize("...")
Quickstart
import uroman
text_in_any_script = "你好世界"
romanized_text = uroman.romanize(text_in_any_script)
print(f"Original: {text_in_any_script}")
print(f"Romanized: {romanized_text}")
# Example with another script
text_arabic = "مرحبا بالعالم"
print(f"Arabic: {text_arabic} -> {uroman.romanize(text_arabic)}")