{"id":8774,"library":"webrtcvad-wheels","title":"WebRTC Voice Activity Detector (VAD) with Binary Wheels","description":"webrtcvad-wheels is a Python interface to the Google WebRTC Voice Activity Detector (VAD). It is a fork of the original `py-webrtcvad` project, specifically maintained to provide pre-compiled binary wheels for various platforms (Windows, macOS, Linux, and specific architectures like ARM) to simplify installation and avoid compilation issues. The library is active, with version 2.0.14 currently providing a robust VAD solution for telephony and speech processing applications.","status":"active","version":"2.0.14","language":"en","source_language":"en","source_url":"https://github.com/daanzu/py-webrtcvad-wheels","tags":["voice activity detection","VAD","audio processing","speech recognition","WebRTC","real-time audio"],"install":[{"cmd":"pip install webrtcvad-wheels","lang":"bash","label":"Install stable version"}],"dependencies":[],"imports":[{"note":"The `Vad` class is accessed as an attribute of the imported `webrtcvad` module, not directly from it.","wrong":"from webrtcvad import Vad","symbol":"Vad","correct":"import webrtcvad\nvad = webrtcvad.Vad()"}],"quickstart":{"code":"import webrtcvad\n\n# Create a VAD object\nvad = webrtcvad.Vad()\n\n# Set aggressiveness mode (0-3, 0 is least aggressive, 3 is most aggressive)\nvad.set_mode(1)\n\n# Define audio parameters\nsample_rate = 16000 # Must be 8000, 16000, 32000, or 48000 Hz\nframe_duration_ms = 10 # Must be 10, 20, or 30 ms\n\n# Generate a silent frame (16-bit mono PCM)\nframe_size_bytes = int(sample_rate * frame_duration_ms / 1000) * 2 # 2 bytes per 16-bit sample\nsilent_frame = b'\\x00\\x00' * (frame_size_bytes // 2)\n\n# Check if the frame contains speech\nis_speech = vad.is_speech(silent_frame, sample_rate)\nprint(f\"Contains speech: {is_speech}\")\n\n# Example with a slightly more aggressive mode\nvad_aggressive = webrtcvad.Vad(3)\nis_speech_aggressive = vad_aggressive.is_speech(silent_frame, sample_rate)\nprint(f\"Contains speech (aggressive mode): {is_speech_aggressive}\")","lang":"python","description":"Initializes the VAD, sets its aggressiveness, and demonstrates checking a silent audio frame for speech. The audio must be 16-bit mono PCM, with a specific sample rate and frame duration."},"warnings":[{"fix":"Ensure your audio frames are pre-processed to meet these specifications before passing them to `vad.is_speech()`. Calculate frame size precisely: `bytes_per_frame = int(sample_rate * frame_duration_ms / 1000) * 2`.","message":"The WebRTC VAD has strict requirements for audio input: it must be 16-bit mono PCM, with a sample rate of 8000, 16000, 32000, or 48000 Hz, and frames must be exactly 10, 20, or 30 ms in duration. Mismatching these parameters will lead to errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade to `webrtcvad-wheels` version 2.0.13 or newer (`pip install --upgrade webrtcvad-wheels`) to benefit from the memory leak fix.","message":"Older versions of `webrtcvad-wheels` (prior to 2.0.13) had a known memory leak when constructing `Vad` objects repeatedly. While less common, this could lead to performance degradation or crashes in long-running applications.","severity":"breaking","affected_versions":"< 2.0.13"},{"fix":"Always use `pip install webrtcvad-wheels` to get the pre-compiled binary wheels, which generally resolves compilation issues.","message":"Many users mistakenly try to install the older, non-wheel `webrtcvad` package, which often fails to compile due to missing C/C++ development tools (e.g., Visual C++ Build Tools on Windows, `gcc` on Linux).","severity":"gotcha","affected_versions":"All versions when trying to install the original `webrtcvad`"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Install `webrtcvad-wheels` instead: `pip install webrtcvad-wheels`. This package provides pre-compiled binaries (wheels) and avoids the need for local compilation.","cause":"Attempting to install the original `webrtcvad` package which requires local compilation with C/C++ development tools that are often missing or misconfigured on the system.","error":"Failed building wheel for webrtcvad / error: command 'gcc' failed: No such file or directory"},{"fix":"Resample your audio to one of the supported rates (8kHz, 16kHz, 32kHz, or 48kHz) before passing it to the VAD. Libraries like `scipy.io.wavfile` or `pydub` can help with resampling.","cause":"The audio data passed to `vad.is_speech()` has a sample rate that is not supported by the WebRTC VAD. Only 8000, 16000, 32000, or 48000 Hz are allowed.","error":"webrtcvad.Error: Invalid sample rate: XXXXX"},{"fix":"Ensure your audio data is chunked into precise 10ms, 20ms, or 30ms segments. Calculate the number of bytes for a frame using `(sample_rate * frame_duration_ms / 1000) * 2`.","cause":"The length of the audio frame (in milliseconds) provided to `vad.is_speech()` does not match the strict requirements of 10, 20, or 30 ms.","error":"ValueError: frame length must be 10, 20 or 30 ms"}]}