kesha-voice-kit

stdio

Local-first voice toolkit: STT (25 langs, ~19x faster than Whisper on Apple Silicon via CoreML, ONNX fallback), TTS (Kokoro + Vosk-TTS + 180 macOS voices, SSML), VAD, language detection (107 langs). Rust engine, OpenClaw skill. No cloud, no API keys.

Tools · 4

kesha Transcribe audio files to text with support for multiple formats (plain text, transcript, JSON, TOON), language detection, timestamps, and speaker diarization.
kesha say Convert text to speech and output audio in WAV, OGG/Opus, or FLAC format. Supports English (Kokoro) and Russian (Vosk-TTS) with auto-language routing.
kesha install Download and install engine models, including optional components like VAD, TTS, and diarization models.
kesha status Show installed backend information for the speech-to-text engine.

Links

githubgithub.com/drakulavich/kesha-voice-kit ↗

★ 38 GitHub stars