Coqui Text-to-Speech (TTS)
Coqui TTS is a deep learning library for advanced Text-to-Speech synthesis, supporting a wide range of models and languages. It enables tasks like voice cloning, multi-speaker TTS, and emotional speech generation. The current version is 0.27.5, and the library maintains an active development pace with frequent patch and minor releases.
Common errors
-
ModuleNotFoundError: No module named 'torch'
cause Since `coqui-tts` v0.27.4, PyTorch and related dependencies are no longer automatically installed, requiring manual installation.fixInstall PyTorch and Torchaudio explicitly for your system. E.g., `pip install torch==2.3.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cpu` (for CPU) or replace `cpu` with your CUDA version (e.g., `cu121`). -
TypeError: synthesize() got an unexpected keyword argument 'speaker_id'
cause The `speaker_id` argument for `synthesize()` was deprecated in v0.27.0 and might be removed or cause errors in newer versions.fixRefactor your code to use the `speaker_wav` argument or other model-specific parameters for speaker selection, as `speaker_id` is no longer the recommended approach. -
RuntimeError: XTTS inference failed due to transformers version incompatibility. Please ensure transformers is within the supported range.
cause Certain versions of the `transformers` library cause issues with Coqui TTS models, particularly XTTS.fixCheck the Coqui TTS release notes for the recommended `transformers` version for your `coqui-tts` release. You might need to downgrade or upgrade `transformers`, e.g., `pip install transformers==4.51.0` or `pip install "transformers>=5.0.0"`. -
AttributeError: 'TTS' object has no attribute 'speakers'
cause This error often occurs when trying to access `tts.speakers` for models that do not support explicit speaker IDs or have a different interface for speaker management (e.g., XTTS uses `speaker_wav`).fixVerify if the loaded model actually supports multiple speakers via `tts.speakers`. For models like XTTS, you typically provide a `speaker_wav` file directly instead of selecting from an indexed list of speakers. Consult the documentation for your specific model.
Warnings
- breaking Starting from v0.27.4, `coqui-tts` no longer installs `torch`, `torchaudio`, and `torchcodec` by default. Users must install these PyTorch dependencies manually to match their system (CPU/CUDA) and desired PyTorch version.
- breaking The old caching mechanism for Bark and Tortoise models has been removed. Additionally, the `speaker_id` argument in the `synthesize()` method is deprecated.
- gotcha Compatibility issues can arise with the `transformers` library, leading to incorrect output or inference errors. Specific `transformers` versions might be required for certain `coqui-tts` releases.
- gotcha Coqui TTS has strict Python version requirements, currently `<3.15,>=3.10`.
Install
-
pip install coqui-tts pip install torch==2.3.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cpu -
pip install coqui-tts pip install torch==2.3.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121 -
pip install 'coqui-tts[all]'
Imports
- TTS
from coqui_tts.tts import TTS
from TTS.api import TTS
Quickstart
from TTS.api import TTS
import os
# This will download and load a default VITS model into memory.
# For GPU, set gpu=True. Ensure appropriate PyTorch version is installed.
# If you encounter issues, try a different model_name, e.g., 'tts_models/multilingual/multi-dataset/xtts_v2'
# For XTTS, you'd also need a speaker_wav file.
# For simpler, single-speaker models, speaker_wav is often optional.
tts = TTS(model_name="tts_models/en/ljspeech/vits", progress_bar=True, gpu=False)
# Synthesize speech to a file.
# Replace 'output.wav' with your desired output path.
output_file = "coqui_output.wav"
tts.tts_to_file(
text="Hello from Coqui TTS, the ultimate text-to-speech library!",
file_path=output_file,
language="en"
)
print(f"Speech saved to {os.path.abspath(output_file)}")