kokoro-onnx: TTS with Kokoro and ONNX Runtime

0.5.0 · active · verified Thu Apr 16

kokoro-onnx is a Python library providing text-to-speech (TTS) capabilities using the Kokoro neural TTS model and ONNX Runtime. It focuses on efficient, near real-time performance on various hardware, including macOS with Apple Silicon. The library is currently at version 0.5.0 and is actively maintained, with regular updates to models and features.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the `Kokoro` class, generate speech from text, and save the output to a WAV file. It is crucial to manually download the `kokoro-v1.0.onnx` and `voices-v1.0.bin` model files from the official GitHub releases and place them in the same directory as your script, or provide their full paths.

import os
import soundfile as sf
from kokoro_onnx import Kokoro

# --- IMPORTANT: Download model files first ---
# Download 'kokoro-v1.0.onnx' and 'voices-v1.0.bin' from:
# https://github.com/thewh1teagle/kokoro-onnx/releases/tag/model-files-v1.0
# Place them in the same directory as this script, or specify full paths.
# -----------------------------------------------

MODEL_PATH = os.environ.get('KOKORO_MODEL_PATH', 'kokoro-v1.0.onnx')
VOICES_PATH = os.environ.get('KOKORO_VOICES_PATH', 'voices-v1.0.bin')

# Ensure model files exist before proceeding
if not os.path.exists(MODEL_PATH) or not os.path.exists(VOICES_PATH):
    print(f"Error: Model files not found. Please download '{MODEL_PATH}' and '{VOICES_PATH}'")
    print("from https://github.com/thewh1teagle/kokoro-onnx/releases/tag/model-files-v1.0")
    print("and place them in the current directory or set KOKORO_MODEL_PATH/KOKORO_VOICES_PATH.")
    exit(1)

try:
    # Initialize Kokoro with model and voice files
    kokoro = Kokoro(MODEL_PATH, VOICES_PATH)

    # Text to synthesize
    text = "Hello, this is a test from kokoro-onnx. How are you today?"

    # Generate speech (default voice is often 'am_michael')
    # You can list available voices via kokoro.get_voices()
    samples, sample_rate = kokoro.create(text, voice='af_alloy')

    # Save the audio to a WAV file
    output_filename = "audio.wav"
    sf.write(output_filename, samples, sample_rate)

    print(f"Speech generated and saved to {output_filename}")
except Exception as e:
    print(f"An error occurred: {e}")
    print("Ensure 'onnxruntime' and 'soundfile' are installed and model files are correct.")

view raw JSON →