kesha-voice-kit
JSON →Local-first voice toolkit: STT (25 langs, ~19x faster than Whisper on Apple Silicon via CoreML, ONNX fallback), TTS (Kokoro + Vosk-TTS + 180 macOS voices, SSML), VAD, language detection (107 langs). Rust engine, OpenClaw skill. No cloud, no API keys.
Tools · 4
- kesha Transcribe audio files to text with support for multiple formats (plain text, transcript, JSON, TOON), language detection, timestamps, and speaker diarization.
- kesha say Convert text to speech and output audio in WAV, OGG/Opus, or FLAC format. Supports English (Kokoro) and Russian (Vosk-TTS) with auto-language routing.
- kesha install Download and install engine models, including optional components like VAD, TTS, and diarization models.
- kesha status Show installed backend information for the speech-to-text engine.
Links
★ 38 GitHub stars