Coqui Text-to-Speech (TTS)

0.27.5 · active · verified Thu Apr 16

Coqui TTS is a deep learning library for advanced Text-to-Speech synthesis, supporting a wide range of models and languages. It enables tasks like voice cloning, multi-speaker TTS, and emotional speech generation. The current version is 0.27.5, and the library maintains an active development pace with frequent patch and minor releases.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize a Coqui TTS model and synthesize speech to an audio file. It uses a default VITS model for English. Remember to install PyTorch and Torchaudio manually if using coqui-tts versions 0.27.4 or newer. Adjust `gpu=True` for GPU acceleration if available and configured.

from TTS.api import TTS
import os

# This will download and load a default VITS model into memory.
# For GPU, set gpu=True. Ensure appropriate PyTorch version is installed.
# If you encounter issues, try a different model_name, e.g., 'tts_models/multilingual/multi-dataset/xtts_v2'
# For XTTS, you'd also need a speaker_wav file.
# For simpler, single-speaker models, speaker_wav is often optional.

tts = TTS(model_name="tts_models/en/ljspeech/vits", progress_bar=True, gpu=False)

# Synthesize speech to a file.
# Replace 'output.wav' with your desired output path.
output_file = "coqui_output.wav"
tts.tts_to_file(
    text="Hello from Coqui TTS, the ultimate text-to-speech library!",
    file_path=output_file,
    language="en"
)

print(f"Speech saved to {os.path.abspath(output_file)}")

view raw JSON →