{"id":9061,"library":"kokoro-onnx","title":"kokoro-onnx: TTS with Kokoro and ONNX Runtime","description":"kokoro-onnx is a Python library providing text-to-speech (TTS) capabilities using the Kokoro neural TTS model and ONNX Runtime. It focuses on efficient, near real-time performance on various hardware, including macOS with Apple Silicon. The library is currently at version 0.5.0 and is actively maintained, with regular updates to models and features.","status":"active","version":"0.5.0","language":"en","source_language":"en","source_url":"https://github.com/thewh1teagle/kokoro-onnx","tags":["tts","text-to-speech","onnx","onnxruntime","ai","machine-learning","audio"],"install":[{"cmd":"pip install -U kokoro-onnx","lang":"bash","label":"Latest stable version"}],"dependencies":[{"reason":"Core dependency for running ONNX models. The CPU version is installed by default; GPU versions (e.g., onnxruntime-gpu) can be installed separately.","package":"onnxruntime","optional":false},{"reason":"Numerical operations, especially for handling audio data.","package":"numpy","optional":false},{"reason":"Required for saving generated audio to WAV files, as demonstrated in quickstart examples.","package":"soundfile","optional":false},{"reason":"For grapheme-to-phoneme conversion, improving pronunciation quality.","package":"phonemizer-fork","optional":true},{"reason":"Alternative or complementary dependency for grapheme-to-phoneme conversion.","package":"espeakng-loader","optional":true}],"imports":[{"symbol":"Kokoro","correct":"from kokoro_onnx import Kokoro"}],"quickstart":{"code":"import os\nimport soundfile as sf\nfrom kokoro_onnx import Kokoro\n\n# --- IMPORTANT: Download model files first ---\n# Download 'kokoro-v1.0.onnx' and 'voices-v1.0.bin' from:\n# https://github.com/thewh1teagle/kokoro-onnx/releases/tag/model-files-v1.0\n# Place them in the same directory as this script, or specify full paths.\n# -----------------------------------------------\n\nMODEL_PATH = os.environ.get('KOKORO_MODEL_PATH', 'kokoro-v1.0.onnx')\nVOICES_PATH = os.environ.get('KOKORO_VOICES_PATH', 'voices-v1.0.bin')\n\n# Ensure model files exist before proceeding\nif not os.path.exists(MODEL_PATH) or not os.path.exists(VOICES_PATH):\n    print(f\"Error: Model files not found. Please download '{MODEL_PATH}' and '{VOICES_PATH}'\")\n    print(\"from https://github.com/thewh1teagle/kokoro-onnx/releases/tag/model-files-v1.0\")\n    print(\"and place them in the current directory or set KOKORO_MODEL_PATH/KOKORO_VOICES_PATH.\")\n    exit(1)\n\ntry:\n    # Initialize Kokoro with model and voice files\n    kokoro = Kokoro(MODEL_PATH, VOICES_PATH)\n\n    # Text to synthesize\n    text = \"Hello, this is a test from kokoro-onnx. How are you today?\"\n\n    # Generate speech (default voice is often 'am_michael')\n    # You can list available voices via kokoro.get_voices()\n    samples, sample_rate = kokoro.create(text, voice='af_alloy')\n\n    # Save the audio to a WAV file\n    output_filename = \"audio.wav\"\n    sf.write(output_filename, samples, sample_rate)\n\n    print(f\"Speech generated and saved to {output_filename}\")\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")\n    print(\"Ensure 'onnxruntime' and 'soundfile' are installed and model files are correct.\")","lang":"python","description":"This quickstart demonstrates how to initialize the `Kokoro` class, generate speech from text, and save the output to a WAV file. It is crucial to manually download the `kokoro-v1.0.onnx` and `voices-v1.0.bin` model files from the official GitHub releases and place them in the same directory as your script, or provide their full paths."},"warnings":[{"fix":"Download `kokoro-v1.0.onnx` and `voices-v1.0.bin` from the latest model-files release (e.g., `thewh1teagle/kokoro-onnx/releases/tag/model-files-v1.0`) and ensure they are accessible by the application, either in the script directory or via specified paths.","message":"Model files (`.onnx` and `.bin`) are mandatory for the application to start. If these files are missing or incorrectly located, the application will exit with an error.","severity":"breaking","affected_versions":"All versions"},{"fix":"On Windows, consider using the CPU execution provider, or if on Linux, ensure CUDA/cuDNN are correctly set up and use `onnxruntime-gpu`.","message":"Users have reported issues with GPU acceleration using the DirectML execution provider on Windows, often resulting in 'Non-zero status code returned while running ConvTranspose node' errors. CUDA execution provider on Linux (e.g., WSL) seems to work better.","severity":"gotcha","affected_versions":"All versions where DirectML is used"},{"fix":"Monitor memory usage for long text generations. Consider processing text in shorter segments if feasible, or restarting the process for critical applications.","message":"A memory leak issue has been reported when synthesizing longer sentences, where memory is not released after synthesis. This might be an upstream model issue, but it impacts `kokoro-onnx` usage.","severity":"gotcha","affected_versions":"0.4.6 and potentially later versions"},{"fix":"Refer to the latest documentation or examples for the recommended way to configure voice and other parameters, favoring a `settings` object if available.","message":"In related `kokoro-onnx` integrations (e.g., `pipecat-ai`'s `KokoroTTSService`), direct parameters like `voice_id` and `params` in the constructor are being deprecated in favor of a `settings` object (e.g., `settings=KokoroTTSService.Settings(...)`). This indicates an evolving API design pattern.","severity":"deprecated","affected_versions":"Potentially future `kokoro-onnx` versions or directly affects integrations built on it."}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Download `kokoro-v1.0.onnx` and `voices-v1.0.bin` from `https://github.com/thewh1teagle/kokoro-onnx/releases/tag/model-files-v1.0` and place them in your script's directory, or explicitly provide their full paths to the `Kokoro` constructor.","cause":"The required ONNX model and voice data files are not present in the expected location.","error":"Error: Model files not found. Please download 'kokoro-v1.0.onnx' and 'voices-v1.0.bin' from ..."},{"fix":"If on Windows, try running with the CPU execution provider by ensuring `onnxruntime` (CPU) is installed and not `onnxruntime-gpu` with DirectML. On Linux, ensure CUDA is properly configured for `onnxruntime-gpu`.","cause":"This error often occurs when using the DirectML execution provider for GPU acceleration on Windows due to operator incompatibilities.","error":"Non-zero status code returned while running ConvTranspose node. Name:'/N.1/pool/ConvTranspose' Status Message: ... The parameter is incorrect."},{"fix":"Ensure all required Visual C++ redistributables are installed. Consider creating a fresh virtual environment and reinstalling `onnxruntime` and `kokoro-onnx`. Check system environment variables for conflicting DLL paths.","cause":"This is a common Windows-specific issue related to `onnx` or `onnxruntime` installation, often indicating missing Visual C++ redistributables or conflicts in the environment path.","error":"DLL load failed while importing onnx_cpp2py_export: A dynamic link library (DLL) initialization routine failed."},{"fix":"Verify that the input text is not empty, contains meaningful characters, and is a string. Check for any other error messages preceding this one that might indicate issues with model loading or parameters.","cause":"The input text provided for speech generation was invalid, empty, or caused an internal processing error.","error":"ERROR: the text-to-speech generation did not return audio. Make sure you have a valid text string."}]}