NVIDIA NeMo Toolkit
NeMo is an open-source, PyTorch-based toolkit for developing state-of-the-art conversational AI models, including Automatic Speech Recognition (ASR), Text-to-Speech (TTS), Large Language Models (LLMs), and Natural Language Processing (NLP). It is currently at version 2.7.2 and receives frequent updates, typically bi-monthly or monthly, with patch releases for security and critical fixes.
Warnings
- breaking The entire `nemo.collections.nlp` module was removed in NeMo v2.6.1. Any code that imports or uses classes/functions from this module will break.
- gotcha NeMo requires a specific PyTorch version to be installed *before* installing NeMo itself, compatible with your CUDA version (if using GPU). Installing NeMo without pre-installing PyTorch can lead to dependency conflicts or incorrect CUDA setups.
- gotcha Some functionalities, especially for ASR, had compatibility issues with NumPy 2.0 prior to NeMo v2.6.1.
- gotcha Users of `numba-cuda` and `cuda-python` packages (often implicit dependencies for GPU acceleration) experienced installation and usage issues in earlier versions.
- gotcha NeMo models (especially large language models or pre-trained ASR/TTS models) require significant GPU memory and disk space for downloading model checkpoints. Running on CPU is possible but much slower.
Install
-
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121 pip install nemo_toolkit[all] -
pip install nemo_toolkit[all]
Imports
- nemo.collections.asr
import nemo.collections.asr as nemo_asr
- nemo.collections.tts
import nemo.collections.tts as nemo_tts
- nemo.collections.nlp
This module was removed in NeMo 2.6.1. Refer to NeMo documentation for alternatives.
Quickstart
import nemo.collections.asr as nemo_asr
# This will download and load the pretrained model from NVIDIA's NGC cloud
# The first run takes time to download the model (~1.5 GB)
asr_model = nemo_asr.models.EncDecRNNTModel.from_pretrained(model_name="stt_en_fastconformer_hybrid_large_ctc_rnnt")
# Path to an audio file (replace with your own or download an example)
# For demonstration, we'll use a placeholder. In a real scenario, you'd have an actual .wav file.
# Example audio can be found in NeMo's tutorials or downloaded from public datasets.
# For example: !wget https://nemo-public.s3.us-east-2.amazonaws.com/example_samples/audio_0.wav
filepath = "./audio_0.wav" # Ensure this file exists for the code to run
# For demonstration, let's create a dummy file if it doesn't exist
import os
if not os.path.exists(filepath):
try:
import torchaudio
import torch
sample_rate = 16000
duration_seconds = 5
waveform = torch.sin(2 * torch.pi * 440 * torch.arange(0, sample_rate * duration_seconds) / sample_rate).unsqueeze(0)
torchaudio.save(filepath, waveform, sample_rate)
print(f"Created dummy audio file: {filepath}")
except ImportError:
print(f"Warning: '{filepath}' not found and torchaudio not installed to create a dummy file. Quickstart may fail.")
transcriptions = asr_model.transcribe([filepath])
print(f"Transcription: {transcriptions[0]}")