NeMo Text Processing
raw JSON → 1.1.0 verified Mon Apr 27 auth: no python
NVIDIA NeMo text processing toolkit for ASR and TTS, providing text normalization (TN) and inverse text normalization (ITN) for over 20 languages. Current version 1.1.0 (August 2024). Release cadence is irregular with minor version bumps every few months.
pip install nemo-text-processing Common errors
error ImportError: cannot import name 'get_normalizer' from 'nemo_text_processing.text_normalization' ↓
cause The function was moved to nemo_text_processing.utils in v1.0.0.
fix
Change import to: from nemo_text_processing.utils import get_normalizer
error pynini.Arc - no such symbol error during normalization ↓
cause Incompatible pynini version (e.g., pynini 2.1.6 with Python 3.12).
fix
Install pynini via conda-forge or downgrade Python to 3.10/3.11.
Warnings
gotcha The normalizer expects input_case='cased' or 'lower_cased'; using default may produce unexpected results for certain texts. ↓
fix Explicitly set input_case parameter.
breaking In v1.0.0, the get_normalizer function moved from nemo_text_processing.text_normalization to nemo_text_processing.utils. ↓
fix Update import to from nemo_text_processing.utils import get_normalizer
gotcha Normalization depends on pynini (WFST library). Installation issues on Windows or Python 3.12+ may cause runtime errors. For Windows, consider installing pynini from conda-forge. ↓
fix Use conda install -c conda-forge pynini on Windows, or use a Linux environment.
Imports
- NEMO_TN_EN wrong
from nemo_text_processing.tn import NEMO_TN_ENcorrectfrom nemo_text_processing.text_normalization.en.utils import NEMO_TN_EN - Normalizer
from nemo_text_processing.text_normalization.normalize import Normalizer - get_normalizer wrong
from nemo_text_processing.text_normalization import get_normalizercorrectfrom nemo_text_processing.utils import get_normalizer
Quickstart
from nemo_text_processing.text_normalization.normalize import Normalizer
normalizer = Normalizer(input_case='cased', lang='en')
text = "He was born on Dec 3rd, 1978."
normalized = normalizer.normalize(text, verbose=True)
print(normalized)
# Output: "He was born on December third, nineteen seventy eight."