Kokoro Text-to-Speech (TTS)
Kokoro is a Python library for Text-to-Speech (TTS) synthesis, leveraging ONNX models for efficient audio generation. It provides a straightforward API to convert text into spoken audio. As of version 0.9.4, it targets Python 3.10-3.12 and is under active development, with releases occurring as new features or bug fixes are integrated.
Common errors
-
ModuleNotFoundError: No module named 'onnxruntime'
cause The core ONNX runtime library is not installed, or the environment is not correctly configured.fixEnsure `onnxruntime` (for CPU) or `onnxruntime-gpu` (for GPU) is installed. Usually, `pip install kokoro` or `pip install kokoro[gpu]` should handle this, but manual installation (`pip install onnxruntime`) might be needed in some environments. -
FileNotFoundError: [Errno 2] No such file or directory: 'path/to/your_model.onnx'
cause The specified ONNX model file or its configuration file does not exist at the given path. This means the model assets haven't been downloaded or the paths are incorrect.fixDownload the `.onnx` model and `.json` config file from `huggingface.co/hexgrad/kokoro_models` and verify the `model_path` and `config_path` variables in your code match their actual location. -
RuntimeError: Failed to find provider 'CUDAExecutionProvider'
cause You are attempting to use the GPU (CUDA) backend without `onnxruntime-gpu` being installed, or your CUDA setup (drivers, toolkit) is not correctly configured or compatible.fixIf you intend to use GPU, ensure you installed with `pip install kokoro[gpu]`. Verify your NVIDIA drivers and CUDA Toolkit are installed and compatible with `onnxruntime-gpu`. -
The 'TextToSpeech' object has no attribute 'config'
cause This error might occur if the library's internal structure changes or if the `TextToSpeech` object failed to initialize its configuration property due to an invalid config file.fixEnsure your `config_path` points to a valid and correctly formatted JSON configuration file compatible with your ONNX model. If upgrading, check library's changelog for breaking changes to the `TextToSpeech` API.
Warnings
- gotcha Model and configuration files are NOT included in the Kokoro package. Users MUST manually download an ONNX model (`.onnx`) and its corresponding config file (`.json`) from the official Hugging Face repository (e.g., `huggingface.co/hexgrad/kokoro_models`) before synthesis.
- breaking The `onnxruntime` dependency requires careful installation for CPU vs. GPU. Installing `pip install kokoro` provides CPU-only support. For GPU acceleration, `pip install kokoro[gpu]` is required, along with a compatible CUDA setup. Mismatched `onnxruntime` versions or attempting to use GPU without `[gpu]` extra will lead to errors like 'Failed to find provider 'CUDAExecutionProvider''.
- gotcha Kokoro strictly requires Python versions 3.10, 3.11, or 3.12. Using unsupported versions (e.g., Python 3.9 or 3.13) will result in installation failures or runtime errors due to dependency constraints.
- gotcha Synthesizing long texts or using very large models can lead to high memory (RAM/VRAM) consumption, potentially causing out-of-memory errors on systems with limited resources.
Install
-
pip install kokoro -
pip install kokoro[gpu]
Imports
- TextToSpeech
from kokoro.tts import TextToSpeech
Quickstart
import os
from kokoro.tts import TextToSpeech
from scipy.io.wavfile import write
# IMPORTANT: Model assets are NOT included in the package.
# Download a model and config from https://huggingface.co/hexgrad/kokoro_models
# For example, 'hexgrad/kokoro_models/tree/main/vits/vctk_ljs'
# Placeholder paths - REPLACE with actual paths to your downloaded files
model_path = os.environ.get('KOKORO_MODEL_PATH', 'path/to/your_model.onnx')
config_path = os.environ.get('KOKORO_CONFIG_PATH', 'path/to/your_config.json')
if not os.path.exists(model_path) or not os.path.exists(config_path):
print(f"Error: Model or config files not found.\n")
print(f"Please download them from https://huggingface.co/hexgrad/kokoro_models\n")
print(f"And set KOKORO_MODEL_PATH and KOKORO_CONFIG_PATH environment variables, or update the script.\n")
exit(1)
try:
tts = TextToSpeech(model_path=model_path, config_path=config_path)
audio = tts.synthesize("Hello, this is a test from the Kokoro library.")
# Save the generated audio
sampling_rate = tts.config.sampling_rate # Access sampling_rate from the loaded config
output_filename = "kokoro_output.wav"
write(output_filename, sampling_rate, audio)
print(f"Audio saved to {output_filename}")
except Exception as e:
print(f"An error occurred during TTS synthesis: {e}")
print("Ensure your model_path and config_path are correct and the ONNX runtime is properly installed.")