Coqui TTS

0.22.0 · active · verified Thu Apr 16

Coqui TTS is a deep learning toolkit for Text-to-Speech synthesis, providing state-of-the-art models and training utilities. It's actively maintained with frequent releases, currently at version `0.22.0`, and supports Python versions from 3.9 to 3.11. It's used for generating high-quality synthetic speech from text, supporting various languages and speaker styles.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize a TTS model (e.g., a Tacotron2 model for English) and synthesize speech to an audio file, automatically detecting and utilizing a GPU if available. It includes basic error handling and hints for common issues.

import torch
from TTS.api import TTS

# Determine device (CUDA if available, otherwise CPU)
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Initialize TTS with a common English model (will download if not available)
try:
    tts = TTS(model_name="tts_models/en/ljspeech/tacotron2-DDC", device=device)

    # Generate speech and save to file
    text_to_synthesize = "Hello, this is a test from the Coqui TTS library."
    output_filepath = "output_audio.wav"
    tts.tts_to_file(text=text_to_synthesize, file_path=output_filepath)
    print(f"Speech synthesized to {output_filepath}")

except Exception as e:
    print(f"An error occurred: {e}")
    print("Please ensure you have installed TTS and its dependencies correctly.")
    print("For GPU support, install torch with CUDA first (e.g., pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118)")

view raw JSON →