NeMo Text Processing

raw JSON →
1.1.0 verified Mon Apr 27 auth: no python

NVIDIA NeMo text processing toolkit for ASR and TTS, providing text normalization (TN) and inverse text normalization (ITN) for over 20 languages. Current version 1.1.0 (August 2024). Release cadence is irregular with minor version bumps every few months.

pip install nemo-text-processing
error ImportError: cannot import name 'get_normalizer' from 'nemo_text_processing.text_normalization'
cause The function was moved to nemo_text_processing.utils in v1.0.0.
fix
Change import to: from nemo_text_processing.utils import get_normalizer
error pynini.Arc - no such symbol error during normalization
cause Incompatible pynini version (e.g., pynini 2.1.6 with Python 3.12).
fix
Install pynini via conda-forge or downgrade Python to 3.10/3.11.
gotcha The normalizer expects input_case='cased' or 'lower_cased'; using default may produce unexpected results for certain texts.
fix Explicitly set input_case parameter.
breaking In v1.0.0, the get_normalizer function moved from nemo_text_processing.text_normalization to nemo_text_processing.utils.
fix Update import to from nemo_text_processing.utils import get_normalizer
gotcha Normalization depends on pynini (WFST library). Installation issues on Windows or Python 3.12+ may cause runtime errors. For Windows, consider installing pynini from conda-forge.
fix Use conda install -c conda-forge pynini on Windows, or use a Linux environment.

Basic text normalization using English normalizer

from nemo_text_processing.text_normalization.normalize import Normalizer
normalizer = Normalizer(input_case='cased', lang='en')
text = "He was born on Dec 3rd, 1978."
normalized = normalizer.normalize(text, verbose=True)
print(normalized)
# Output: "He was born on December third, nineteen seventy eight."