{"id":5551,"library":"webrtcvad","title":"WebRTC Voice Activity Detector","description":"webrtcvad is a Python interface to the Google WebRTC Voice Activity Detector (VAD). It classifies short segments of audio as being voiced or unvoiced, useful for telephony and speech recognition. The current version is 2.0.10, with releases historically following an infrequent, as-needed cadence to incorporate upstream WebRTC VAD changes or bug fixes.","status":"active","version":"2.0.10","language":"en","source_language":"en","source_url":"https://github.com/wiseman/py-webrtcvad","tags":["audio","voice activity detection","VAD","speech processing","real-time"],"install":[{"cmd":"pip install webrtcvad","lang":"bash","label":"Install stable version"}],"dependencies":[],"imports":[{"symbol":"Vad","correct":"import webrtcvad\nvad = webrtcvad.Vad()"}],"quickstart":{"code":"import webrtcvad\nimport struct\n\n# WebRTC VAD requires 16-bit mono PCM audio at specific sample rates\n# and frame durations (10, 20, or 30 ms).\nsample_rate = 16000 # Hz\nframe_duration_ms = 30 # ms\nbytes_per_sample = 2 # 16-bit audio\n\n# Calculate frame size in bytes\nframe_size_bytes = int(sample_rate * (frame_duration_ms / 1000.0) * bytes_per_sample)\n\n# Create a VAD instance with an aggressiveness mode (0-3)\n# 0: least aggressive, 3: most aggressive\nvad = webrtcvad.Vad(3)\n\n# Create a silent audio frame (16-bit mono PCM)\nsilence_frame = b'\\x00\\x00' * int(frame_size_bytes / bytes_per_sample)\n\n# Create a mock speech-like frame (simple sine wave for demonstration)\n# In a real application, this would come from an audio input.\nspeech_frame = b''\nfor i in range(int(frame_size_bytes / bytes_per_sample)):\n    # Simple sine wave approximation for a speech-like signal\n    amplitude = 10000 # Max 32767 for 16-bit\n    value = int(amplitude * (i % 30 < 15) - amplitude * (i % 30 >= 15)) # Square wave approximation\n    speech_frame += struct.pack('<h', value)\n\n\nprint(f\"Processing frame of {frame_duration_ms} ms at {sample_rate} Hz\")\n\n# Test with silence\nis_speech_silence = vad.is_speech(silence_frame, sample_rate)\nprint(f\"Silence frame contains speech: {is_speech_silence}\")\n\n# Test with speech-like audio\nis_speech_mock = vad.is_speech(speech_frame, sample_rate)\nprint(f\"Mock speech frame contains speech: {is_speech_mock}\")\n\n# You can also set the mode after initialization\nvad.set_mode(1)\nprint(f\"VAD aggressiveness set to 1.\")\n","lang":"python","description":"Initializes the VAD, sets its aggressiveness mode, and demonstrates classifying silence and a mock speech segment. It highlights the strict audio format requirements: 16-bit mono PCM at 8000, 16000, 32000, or 48000 Hz, with frame durations of 10, 20, or 30 ms."},"warnings":[{"fix":"Ensure your audio input (e.g., from a microphone or file) is pre-processed to match these specifications before passing it to `vad.is_speech()`.","message":"The WebRTC VAD has strict audio input requirements: 16-bit, mono PCM audio, sampled at 8000, 16000, 32000, or 48000 Hz. Frames must be exactly 10, 20, or 30 ms in duration.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Consider using `pip install webrtcvad-wheels` instead, which is a fork specifically designed to provide pre-compiled binary wheels for easier installation across various platforms. The API remains the same. If sticking with `webrtcvad`, ensure you have a C/C++ compiler installed and up-to-date Python development headers.","message":"The `webrtcvad` package (wiseman/py-webrtcvad) can be difficult to install on some platforms due to its C/C++ dependencies and lack of pre-built wheels for all Python versions/OS combinations.","severity":"gotcha","affected_versions":"All versions, particularly on Windows or less common Linux/macOS configurations."},{"fix":"Ensure you are using at least `webrtcvad` 2.0.10. For the most robust memory handling, consider migrating to `webrtcvad-wheels` and using its latest version.","message":"Version 2.0.10 fixed a memory leak in the `is_speech()` method. While 2.0.10 should contain this fix, later versions of the `webrtcvad-wheels` fork (e.g., 2.0.13) address further memory leak issues.","severity":"gotcha","affected_versions":"Prior to 2.0.10, and potentially some lingering issues in 2.0.10 that were addressed in later `webrtcvad-wheels` versions."},{"fix":"For applications requiring higher accuracy in challenging audio environments or more nuanced classification beyond simple speech/no-speech, consider integrating more advanced VAD algorithms or machine learning models.","message":"The WebRTC VAD is a simple, real-time oriented model and may produce false positives for non-speech sounds (e.g., music, birdsong) or false negatives in very noisy environments, even at high aggressiveness settings.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}