Kokoro Text-to-Speech (TTS)

0.9.4 · active · verified Thu Apr 16

Kokoro is a Python library for Text-to-Speech (TTS) synthesis, leveraging ONNX models for efficient audio generation. It provides a straightforward API to convert text into spoken audio. As of version 0.9.4, it targets Python 3.10-3.12 and is under active development, with releases occurring as new features or bug fixes are integrated.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the `TextToSpeech` model and synthesize audio. It requires you to first download an ONNX model file (`.onnx`) and its corresponding configuration file (`.json`) from the official Hugging Face repository, as these are not bundled with the package. Ensure the paths are correctly set before running.

import os
from kokoro.tts import TextToSpeech
from scipy.io.wavfile import write

# IMPORTANT: Model assets are NOT included in the package.
# Download a model and config from https://huggingface.co/hexgrad/kokoro_models
# For example, 'hexgrad/kokoro_models/tree/main/vits/vctk_ljs'

# Placeholder paths - REPLACE with actual paths to your downloaded files
model_path = os.environ.get('KOKORO_MODEL_PATH', 'path/to/your_model.onnx')
config_path = os.environ.get('KOKORO_CONFIG_PATH', 'path/to/your_config.json')

if not os.path.exists(model_path) or not os.path.exists(config_path):
    print(f"Error: Model or config files not found.\n")
    print(f"Please download them from https://huggingface.co/hexgrad/kokoro_models\n")
    print(f"And set KOKORO_MODEL_PATH and KOKORO_CONFIG_PATH environment variables, or update the script.\n")
    exit(1)

try:
    tts = TextToSpeech(model_path=model_path, config_path=config_path)
    audio = tts.synthesize("Hello, this is a test from the Kokoro library.")

    # Save the generated audio
    sampling_rate = tts.config.sampling_rate # Access sampling_rate from the loaded config
    output_filename = "kokoro_output.wav"
    write(output_filename, sampling_rate, audio)
    print(f"Audio saved to {output_filename}")
except Exception as e:
    print(f"An error occurred during TTS synthesis: {e}")
    print("Ensure your model_path and config_path are correct and the ONNX runtime is properly installed.")

view raw JSON →