NVIDIA NeMo Toolkit

2.7.2 · active · verified Sun Apr 12

NeMo is an open-source, PyTorch-based toolkit for developing state-of-the-art conversational AI models, including Automatic Speech Recognition (ASR), Text-to-Speech (TTS), Large Language Models (LLMs), and Natural Language Processing (NLP). It is currently at version 2.7.2 and receives frequent updates, typically bi-monthly or monthly, with patch releases for security and critical fixes.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load a pre-trained ASR model and transcribe an audio file. The model will be downloaded automatically on the first run. Ensure you have a `.wav` audio file (preferably 16kHz mono) at the specified `filepath`.

import nemo.collections.asr as nemo_asr

# This will download and load the pretrained model from NVIDIA's NGC cloud
# The first run takes time to download the model (~1.5 GB)
asr_model = nemo_asr.models.EncDecRNNTModel.from_pretrained(model_name="stt_en_fastconformer_hybrid_large_ctc_rnnt")

# Path to an audio file (replace with your own or download an example)
# For demonstration, we'll use a placeholder. In a real scenario, you'd have an actual .wav file.
# Example audio can be found in NeMo's tutorials or downloaded from public datasets.
# For example: !wget https://nemo-public.s3.us-east-2.amazonaws.com/example_samples/audio_0.wav
filepath = "./audio_0.wav" # Ensure this file exists for the code to run

# For demonstration, let's create a dummy file if it doesn't exist
import os
if not os.path.exists(filepath):
    try:
        import torchaudio
        import torch
        sample_rate = 16000
        duration_seconds = 5
        waveform = torch.sin(2 * torch.pi * 440 * torch.arange(0, sample_rate * duration_seconds) / sample_rate).unsqueeze(0)
        torchaudio.save(filepath, waveform, sample_rate)
        print(f"Created dummy audio file: {filepath}")
    except ImportError:
        print(f"Warning: '{filepath}' not found and torchaudio not installed to create a dummy file. Quickstart may fail.")


transcriptions = asr_model.transcribe([filepath])
print(f"Transcription: {transcriptions[0]}")

view raw JSON →