SpeechBrain

1.1.0 · active · verified Sat Apr 11

SpeechBrain is an open-source, all-in-one speech toolkit built in pure Python and PyTorch. It facilitates research and development of neural speech processing systems, offering a wide range of models for tasks like ASR, VAD, Speaker Recognition, Voice Enhancement, and more. The current version is 1.1.0, with releases typically tied to research milestones and new model introductions.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load a pretrained Automatic Speech Recognition (ASR) model and transcribe a dummy audio input. It highlights the use of `from_hparams` for model loading and includes cleanup for the temporary download directory.

import torchaudio
import torch
import os
import shutil
from speechbrain.pretrained import EncoderDecoderASR

# Ensure a temporary directory for model downloads
savedir = "tmpdir_asr_quickstart"

# Initialize ASR model
try:
    asr_model = EncoderDecoderASR.from_hparams(
        source="speechbrain/asr-crdnn-rnnlm-librispeech",
        savedir=savedir
    )

    # Create a dummy audio tensor (batch_size, samples)
    # SpeechBrain models typically expect single-channel, 16kHz audio.
    sample_rate = 16000
    duration_seconds = 3
    # Generate a random tensor mimicking a short audio clip
    dummy_audio = torch.randn(1, sample_rate * duration_seconds)

    # Perform ASR
    transcription = asr_model.transcribe_batch(dummy_audio)
    print(f"Transcription: {transcription}")

except Exception as e:
    print(f"An error occurred: {e}")
finally:
    # Clean up the downloaded model directory
    if os.path.exists(savedir):
        shutil.rmtree(savedir, ignore_errors=True)
        print(f"Cleaned up temporary directory: {savedir}")

view raw JSON →