Silero Models

0.5.5 · active · verified Fri Apr 17

Silero Models provides a collection of pre-trained enterprise-grade Text-to-Speech (TTS) models primarily focused on Russian and CIS languages, as well as speech-to-text models. It leverages PyTorch for model inference, offering high-quality and fast speech generation. The library is actively maintained with frequent updates, currently at version 0.5.5, with a focus on expanding language support and model quality.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load a pre-trained Silero TTS model using `torch.hub.load` and synthesize speech from text. It uses the 'v5_ru' Russian model with a specific speaker. Make sure `torch`, `torchaudio`, and `soundfile` are installed as they are crucial prerequisites, even if not direct dependencies of the `silero` PyPI package itself.

import torch
import torchaudio

# Ensure PyTorch is available and get device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# --- TTS Quickstart ---
# Define model parameters
model_id = 'v5_ru'
language = 'ru'
speaker = 'baya'
put_accent = True
put_yo = True
sample_rate = 48000 # or 24000, 16000
text = 'В недрах тундры выдры в гетрах тырят в вёдра ядра кедров.'

try:
    # Load the Silero TTS model from torch.hub
    model, _ = torch.hub.load(repo_or_dir='snakers4/silero-models',
                              model='silero_tts',
                              language=language,
                              speaker=model_id,
                              put_accent=put_accent,
                              put_yo=put_yo)
    model.to(device)

    # Synthesize audio
    audio_tensor = model(text=text, speaker=speaker, sample_rate=sample_rate)
    
    # Example of saving audio (requires 'soundfile')
    # from silero.utils import save_audio
    # output_path = 'output_audio.wav'
    # save_audio(audio_tensor.cpu(), output_path, sample_rate)
    # print(f'Audio saved to {output_path}')
    
    print(f"Successfully synthesized audio. Tensor shape: {audio_tensor.shape}, Sample Rate: {sample_rate}")
except Exception as e:
    print(f"An error occurred during TTS synthesis: {e}")
    print("Please ensure PyTorch, TorchAudio, and potentially SoundFile are installed.")

view raw JSON →