{"id":7852,"library":"vosk","title":"Vosk: Offline Speech Recognition API","description":"Vosk is an offline, open-source speech recognition toolkit based on Kaldi. It provides Python bindings for performing speech-to-text conversion for over 20 languages and dialects, supporting continuous large vocabulary transcription. It is designed to run efficiently on various devices, including Raspberry Pi, and ensures privacy as audio data is processed locally. The current version is 0.3.45, with active development and frequent releases based on its GitHub activity.","status":"active","version":"0.3.45","language":"en","source_language":"en","source_url":"https://github.com/alphacep/vosk-api","tags":["speech-recognition","offline","kaldi","audio-processing","voice-ai"],"install":[{"cmd":"pip install vosk","lang":"bash","label":"Core library"},{"cmd":"pip install vosk pyaudio","lang":"bash","label":"With microphone support (PyAudio optional)"}],"dependencies":[{"reason":"Vosk requires Python 3.x.","package":"Python","optional":false},{"reason":"Needed for real-time audio input from a microphone. May require system-level dependencies (e.g., PortAudio).","package":"PyAudio","optional":true},{"reason":"Useful for converting audio files to the required format (16kHz, 16-bit PCM, mono WAV) before processing with Vosk.","package":"FFmpeg","optional":true}],"imports":[{"symbol":"Model","correct":"from vosk import Model"},{"symbol":"KaldiRecognizer","correct":"from vosk import KaldiRecognizer"},{"note":"The common import pattern is `from vosk import Model, KaldiRecognizer` or `import vosk`. Direct import from `vosk_api` is incorrect.","wrong":"from vosk_api import Model","symbol":"Vosk imports as vosk.*","correct":"import vosk"}],"quickstart":{"code":"import os\nimport wave\nfrom vosk import Model, KaldiRecognizer\n\n# --- IMPORTANT: Download a Vosk model ---\n# 1. Visit https://alphacephei.com/vosk/models\n# 2. Download a small model (e.g., vosk-model-small-en-us-0.22.zip)\n# 3. Unzip it into a directory. For this example, let's assume it's in a 'model' folder\n#    adjacent to your script, e.g., 'your_project/model/vosk-model-small-en-us-0.22'\n\nMODEL_PATH = \"model/vosk-model-small-en-us-0.22\"  # Adjust this path to your downloaded model\nAUDIO_FILE = \"test.wav\" # Ensure you have a WAV file (16kHz, 16-bit PCM, mono)\n\nif not os.path.exists(MODEL_PATH):\n    print(f\"Error: Vosk model not found at {MODEL_PATH}\")\n    print(\"Please download a model from https://alphacephei.com/vosk/models and unzip it into the specified path.\")\n    exit(1)\n\n# Load the Vosk model\nmodel = Model(MODEL_PATH)\n\n# Initialize the KaldiRecognizer with the model and the audio sample rate\n# The sample rate MUST match the audio file's sample rate (usually 16000 for Vosk models)\nrec = KaldiRecognizer(model, 16000)\n\n# Open the audio file\ntry:\n    wf = wave.open(AUDIO_FILE, \"rb\")\n    if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != \"NONE\":\n        print(\"Audio file must be MONO, 16-bit PCM, uncompressed WAV.\")\n        print(\"Consider using ffmpeg to convert: ffmpeg -i input.mp3 -ar 16000 -ac 1 -acodec pcm_s16le output.wav\")\n        exit(1)\nexcept wave.Error as e:\n    print(f\"Error opening audio file {AUDIO_FILE}: {e}\")\n    print(\"Please ensure the audio file exists and is a valid WAV.\")\n    exit(1)\n\n# Process audio data in chunks\nprint(\"Transcribing...\")\nwhile True:\n    data = wf.readframes(4000) # Read 4000 frames (approx. 0.25 seconds for 16kHz audio)\n    if len(data) == 0:\n        break\n    if rec.AcceptWaveform(data):\n        result = rec.Result()\n        print(result)\n\n# Get final result for any remaining audio\nfinal_result = rec.FinalResult()\nprint(final_result)\n\nprint(\"Transcription complete.\")","lang":"python","description":"This quickstart demonstrates how to set up Vosk for transcribing a WAV audio file. It involves downloading a pre-trained language model, loading it into a `Model` object, initializing a `KaldiRecognizer` with the model and the audio's sample rate, and then feeding audio data in chunks for recognition. Ensure your audio file is 16kHz, 16-bit PCM, mono WAV format."},"warnings":[{"fix":"Review calls related to word timing extraction and adjust your code to handle `SetWords(True)` explicitly if detailed word timings are required, as default behavior may have changed.","message":"Vosk version 0.3.30 introduced an API change regarding word times, making them optional. Code relying on previous implicit behavior might need adjustment.","severity":"breaking","affected_versions":"0.3.30 and later"},{"fix":"Ensure your audio input (file or microphone stream) matches the model's expected format (e.g., 16kHz sample rate, 16-bit PCM, mono). Use tools like FFmpeg for conversion if necessary: `ffmpeg -i input.mp3 -ar 16000 -ac 1 -acodec pcm_s16le output.wav`. The sample rate passed to `KaldiRecognizer` MUST match the audio's actual sample rate.","message":"Incorrect audio format (e.g., stereo, wrong sample rate, compressed) is a common cause of poor recognition or the `Failed to process waveform` error. Vosk models typically expect mono, 16-bit PCM, uncompressed WAV audio, usually at 16kHz sample rate.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always download and unzip a Vosk model from the official website (alphacephei.com/vosk/models). Provide the absolute path to the unzipped model directory when initializing `vosk.Model()`. If using `model_name`, Vosk might look in a cache directory; for custom paths, pass the full folder path directly to `Model()`.","message":"Model files must be downloaded separately and their path correctly specified. Using relative paths can lead to errors if the script's execution directory changes, or if `model_name` argument is used incorrectly.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure `vosk` is installed in your active Python environment: `pip install vosk`. If you have multiple Python versions, verify `pip show vosk` points to the correct installation. Using virtual environments (`venv` or `conda`) is highly recommended to isolate dependencies.","cause":"The Vosk library is not installed in the currently active Python environment, or there are multiple Python installations causing conflicts.","error":"ModuleNotFoundError: No module named 'vosk'"},{"fix":"Verify that the path provided to `vosk.Model()` points directly to the *unzipped* model directory (e.g., `vosk-model-en-us-small-0.22`), not the zip file itself. Ensure all model subdirectories and files are present within this path. Use an absolute path for robustness.","cause":"The `vosk.Model()` constructor could not find the necessary model files in the specified directory. This often happens due to incorrect path, unzipped folder structure, or an incomplete model download.","error":"ERROR (VoskAPI:Model():model.cc:122) Folder 'model' does not contain model files. Make sure you specified the model path properly in Model constructor. Exception: Failed to create a model."},{"fix":"Check the sample rate of your audio file/stream and ensure it matches the `sample_rate` argument provided to `vosk.KaldiRecognizer()`. Additionally, confirm the audio is mono, 16-bit PCM, and uncompressed WAV. Convert the audio if necessary using `ffmpeg`.","cause":"The audio data being fed to `recognizer.AcceptWaveform()` does not match the expected format or sample rate of the loaded Vosk model. This is commonly due to a mismatch in sample rates between the audio source and the `KaldiRecognizer` initialization.","error":"Exception: Failed to process waveform"},{"fix":"Double-check the audio format (mono, 16-bit PCM, 16kHz WAV is standard) and ensure the `KaldiRecognizer` is initialized with the correct sample rate. Verify the audio actually contains speech and is not too quiet. Try a different Vosk model for the target language. Ensure you are calling `rec.FinalResult()` at the end of processing to retrieve any pending transcription.","cause":"This usually indicates that Vosk is not detecting any speech or is unable to process the audio effectively. Common causes include incorrect audio format, extremely low volume, a model not suited for the language or accent, or an incorrect sample rate.","error":"{ \"text\" : \"\" } (Vosk returns empty transcription)"}]}