{"id":8730,"library":"tts","title":"Coqui TTS","description":"Coqui TTS is a deep learning toolkit for Text-to-Speech synthesis, providing state-of-the-art models and training utilities. It's actively maintained with frequent releases, currently at version `0.22.0`, and supports Python versions from 3.9 to 3.11. It's used for generating high-quality synthetic speech from text, supporting various languages and speaker styles.","status":"active","version":"0.22.0","language":"en","source_language":"en","source_url":"https://github.com/coqui-ai/TTS","tags":["text-to-speech","tts","deep-learning","audio","speech-synthesis","pytorch"],"install":[{"cmd":"pip install tts","lang":"bash","label":"Base installation"},{"cmd":"pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 && pip install tts","lang":"bash","label":"With CUDA 11.8 support (adjust index-url for your CUDA version)"}],"dependencies":[{"reason":"Core deep learning framework for model execution. Specific versions are critical for CUDA compatibility.","package":"torch","optional":false},{"reason":"Audio processing library, tightly coupled with torch. Specific versions are critical for CUDA compatibility.","package":"torchaudio","optional":false}],"imports":[{"symbol":"TTS","correct":"from TTS.api import TTS"}],"quickstart":{"code":"import torch\nfrom TTS.api import TTS\n\n# Determine device (CUDA if available, otherwise CPU)\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\nprint(f\"Using device: {device}\")\n\n# Initialize TTS with a common English model (will download if not available)\ntry:\n    tts = TTS(model_name=\"tts_models/en/ljspeech/tacotron2-DDC\", device=device)\n\n    # Generate speech and save to file\n    text_to_synthesize = \"Hello, this is a test from the Coqui TTS library.\"\n    output_filepath = \"output_audio.wav\"\n    tts.tts_to_file(text=text_to_synthesize, file_path=output_filepath)\n    print(f\"Speech synthesized to {output_filepath}\")\n\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")\n    print(\"Please ensure you have installed TTS and its dependencies correctly.\")\n    print(\"For GPU support, install torch with CUDA first (e.g., pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118)\")","lang":"python","description":"This quickstart demonstrates how to initialize a TTS model (e.g., a Tacotron2 model for English) and synthesize speech to an audio file, automatically detecting and utilizing a GPU if available. It includes basic error handling and hints for common issues."},"warnings":[{"fix":"Migrate your code to use `from TTS.api import TTS` and load models by their string identifiers (e.g., `tts = TTS(model_name='tts_models/en/ljspeech/tacotron2-DDC')`).","message":"The primary API for model inference shifted significantly around versions 0.20.0-0.21.0. Older approaches that involved directly importing and instantiating model classes (e.g., `from TTS.vocoder.models.wavernn import WaveRNN`) are largely superseded by the unified `TTS` class from `TTS.api`. While some direct imports might still function, the recommended and supported way to load and use models is via `TTS.api.TTS(model_name='...')`.","severity":"breaking","affected_versions":"Before ~0.20.0"},{"fix":"Ensure you have adequate hardware resources (GPU with sufficient VRAM). Consider using smaller models or CPU inference if hardware is limited (though CPU will be much slower for large models).","message":"Models like XTTS v2 are highly resource-intensive, requiring substantial GPU VRAM (e.g., 10GB+) and system RAM (16GB+). Running these models on CPU or under-resourced GPUs can lead to `CUDA out of memory` errors or extremely slow inference speeds.","severity":"gotcha","affected_versions":"All versions with XTTS models"},{"fix":"Install `espeak-ng` and `ffmpeg` via your system's package manager (e.g., `sudo apt-get install espeak-ng ffmpeg` on Debian/Ubuntu, `brew install espeak-ng ffmpeg` on macOS).","message":"Many multilingual and advanced TTS models rely on external system-level dependencies like `espeak-ng` and `ffmpeg` for phonemization and audio processing. These are not installed by `pip` and must be manually installed on your operating system.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Explicitly install `torch` and `torchaudio` with the correct CUDA index-url *before* installing `tts`. For example, `pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118`. Always refer to PyTorch's official installation guide for your specific CUDA version.","message":"Achieving GPU acceleration requires careful management of `torch`, `torchaudio`, and CUDA toolkit versions. Installing `tts` via `pip` usually pulls in CPU versions of `torch` and `torchaudio` if GPU-enabled versions are not pre-installed. Mismatched versions can lead to `CUDA not available` or runtime errors.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Install the package using `pip install tts`. Verify the correct Python environment is activated before running your script.","cause":"The `tts` package is not installed in the current Python environment, or the environment where it was installed is not active.","error":"ModuleNotFoundError: No module named 'TTS'"},{"fix":"Use a smaller model, reduce batch size (if applicable), or switch to CPU inference (by setting `device='cpu'` or `gpu=False`). Ensure no other GPU-intensive applications are running. Consider upgrading your GPU or offloading parts of the model if supported.","cause":"The GPU lacks sufficient memory to process the current operation, often due to a large model (like XTTS v2) or high batch size.","error":"RuntimeError: CUDA out of memory. Tried to allocate X GiB (GPU 0; Y GiB total capacity; Z GiB already allocated; W GiB free; P MiB reserved in total by PyTorch)"},{"fix":"Install `espeak-ng` on your operating system. For Debian/Ubuntu-based systems: `sudo apt-get install espeak-ng`. For macOS: `brew install espeak-ng`.","cause":"The `espeak-ng` command-line tool, a common external dependency for many TTS models for phonemization, is not installed or not discoverable in the system's PATH.","error":"FileNotFoundError: [Errno 2] No such file or directory: 'espeak-ng'"},{"fix":"Verify the capabilities of the specific model loaded. If it's a single-speaker model, these attributes are not available. For multi-speaker models like XTTS v2, ensure you are passing `speaker_wav` and `language` as arguments to `tts_to_file`.","cause":"You are attempting to access multi-speaker specific attributes (like `speakers` or `languages`) on a `TTS` instance initialized with a single-speaker model, or a model that does not expose these properties directly.","error":"AttributeError: 'TTS' object has no attribute 'speakers'"}]}