WebRTC Voice Activity Detector (VAD) with Binary Wheels
webrtcvad-wheels is a Python interface to the Google WebRTC Voice Activity Detector (VAD). It is a fork of the original `py-webrtcvad` project, specifically maintained to provide pre-compiled binary wheels for various platforms (Windows, macOS, Linux, and specific architectures like ARM) to simplify installation and avoid compilation issues. The library is active, with version 2.0.14 currently providing a robust VAD solution for telephony and speech processing applications.
Common errors
-
Failed building wheel for webrtcvad / error: command 'gcc' failed: No such file or directory
cause Attempting to install the original `webrtcvad` package which requires local compilation with C/C++ development tools that are often missing or misconfigured on the system.fixInstall `webrtcvad-wheels` instead: `pip install webrtcvad-wheels`. This package provides pre-compiled binaries (wheels) and avoids the need for local compilation. -
webrtcvad.Error: Invalid sample rate: XXXXX
cause The audio data passed to `vad.is_speech()` has a sample rate that is not supported by the WebRTC VAD. Only 8000, 16000, 32000, or 48000 Hz are allowed.fixResample your audio to one of the supported rates (8kHz, 16kHz, 32kHz, or 48kHz) before passing it to the VAD. Libraries like `scipy.io.wavfile` or `pydub` can help with resampling. -
ValueError: frame length must be 10, 20 or 30 ms
cause The length of the audio frame (in milliseconds) provided to `vad.is_speech()` does not match the strict requirements of 10, 20, or 30 ms.fixEnsure your audio data is chunked into precise 10ms, 20ms, or 30ms segments. Calculate the number of bytes for a frame using `(sample_rate * frame_duration_ms / 1000) * 2`.
Warnings
- gotcha The WebRTC VAD has strict requirements for audio input: it must be 16-bit mono PCM, with a sample rate of 8000, 16000, 32000, or 48000 Hz, and frames must be exactly 10, 20, or 30 ms in duration. Mismatching these parameters will lead to errors.
- breaking Older versions of `webrtcvad-wheels` (prior to 2.0.13) had a known memory leak when constructing `Vad` objects repeatedly. While less common, this could lead to performance degradation or crashes in long-running applications.
- gotcha Many users mistakenly try to install the older, non-wheel `webrtcvad` package, which often fails to compile due to missing C/C++ development tools (e.g., Visual C++ Build Tools on Windows, `gcc` on Linux).
Install
-
pip install webrtcvad-wheels
Imports
- Vad
from webrtcvad import Vad
import webrtcvad vad = webrtcvad.Vad()
Quickstart
import webrtcvad
# Create a VAD object
vad = webrtcvad.Vad()
# Set aggressiveness mode (0-3, 0 is least aggressive, 3 is most aggressive)
vad.set_mode(1)
# Define audio parameters
sample_rate = 16000 # Must be 8000, 16000, 32000, or 48000 Hz
frame_duration_ms = 10 # Must be 10, 20, or 30 ms
# Generate a silent frame (16-bit mono PCM)
frame_size_bytes = int(sample_rate * frame_duration_ms / 1000) * 2 # 2 bytes per 16-bit sample
silent_frame = b'\x00\x00' * (frame_size_bytes // 2)
# Check if the frame contains speech
is_speech = vad.is_speech(silent_frame, sample_rate)
print(f"Contains speech: {is_speech}")
# Example with a slightly more aggressive mode
vad_aggressive = webrtcvad.Vad(3)
is_speech_aggressive = vad_aggressive.is_speech(silent_frame, sample_rate)
print(f"Contains speech (aggressive mode): {is_speech_aggressive}")