Silero Models
Silero Models provides a collection of pre-trained enterprise-grade Text-to-Speech (TTS) models primarily focused on Russian and CIS languages, as well as speech-to-text models. It leverages PyTorch for model inference, offering high-quality and fast speech generation. The library is actively maintained with frequent updates, currently at version 0.5.5, with a focus on expanding language support and model quality.
Common errors
-
ModuleNotFoundError: No module named 'torchaudio'
cause The `torchaudio` library, which is critical for Silero's audio processing, is not installed.fixInstall `torchaudio`: `pip install torchaudio`. -
RuntimeError: Requested model is not available for language 'xx'. Available languages: [...]
cause The specified language or model ID in `torch.hub.load` is either incorrect or not supported by the current Silero model version.fixCheck the available languages and model IDs in the official Silero Models repository's `models.yml` file or the quickstart guide. Ensure you are using a currently supported model, especially after v5.0 breaking changes. -
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
cause A common PyTorch error indicating that tensors involved in an operation are not on the same computational device (e.g., one on GPU, one on CPU).fixEnsure all tensors, including the model itself, input text embeddings (if manually processed), and any other relevant data, are moved to the same device (e.g., `model.to(device)` and `input_tensor.to(device)`).
Warnings
- breaking Silero v5.0 and later versions (e.g., v5.2) deprecated and removed legacy models (v1 and v2) and tools. Attempting to load these older models will result in errors.
- breaking As of v5.1, `torchaudio` was removed as a direct dependency of the `silero` pip package. While `silero` installs without it, `torchaudio` is still essential for most functionalities (e.g., audio I/O, many model operations).
- gotcha The core dependencies `torch`, `torchaudio`, and `soundfile` are not always automatically installed by `pip install silero`. Missing these will lead to `ModuleNotFoundError` or `RuntimeError` during model loading or inference.
- gotcha The license for Silero Models changed to GNU AGPL 3.0 in v5.4. Previous versions used CC BY-NC 4.0. Users should be aware of the implications of the AGPL-3.0 license for commercial or proprietary use.
Install
-
pip install torch torchaudio silero soundfile
Imports
- torch
import torch
- torchaudio
import torchaudio
- silero.utils.save_audio
from silero.utils import save_audio
- silero.utils.read_audio
from silero.utils import read_audio
Quickstart
import torch
import torchaudio
# Ensure PyTorch is available and get device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# --- TTS Quickstart ---
# Define model parameters
model_id = 'v5_ru'
language = 'ru'
speaker = 'baya'
put_accent = True
put_yo = True
sample_rate = 48000 # or 24000, 16000
text = 'В недрах тундры выдры в гетрах тырят в вёдра ядра кедров.'
try:
# Load the Silero TTS model from torch.hub
model, _ = torch.hub.load(repo_or_dir='snakers4/silero-models',
model='silero_tts',
language=language,
speaker=model_id,
put_accent=put_accent,
put_yo=put_yo)
model.to(device)
# Synthesize audio
audio_tensor = model(text=text, speaker=speaker, sample_rate=sample_rate)
# Example of saving audio (requires 'soundfile')
# from silero.utils import save_audio
# output_path = 'output_audio.wav'
# save_audio(audio_tensor.cpu(), output_path, sample_rate)
# print(f'Audio saved to {output_path}')
print(f"Successfully synthesized audio. Tensor shape: {audio_tensor.shape}, Sample Rate: {sample_rate}")
except Exception as e:
print(f"An error occurred during TTS synthesis: {e}")
print("Please ensure PyTorch, TorchAudio, and potentially SoundFile are installed.")