WebRTC Voice Activity Detector (VAD) with Binary Wheels

2.0.14 · active · verified Thu Apr 16

webrtcvad-wheels is a Python interface to the Google WebRTC Voice Activity Detector (VAD). It is a fork of the original `py-webrtcvad` project, specifically maintained to provide pre-compiled binary wheels for various platforms (Windows, macOS, Linux, and specific architectures like ARM) to simplify installation and avoid compilation issues. The library is active, with version 2.0.14 currently providing a robust VAD solution for telephony and speech processing applications.

Common errors

Warnings

Install

Imports

Quickstart

Initializes the VAD, sets its aggressiveness, and demonstrates checking a silent audio frame for speech. The audio must be 16-bit mono PCM, with a specific sample rate and frame duration.

import webrtcvad

# Create a VAD object
vad = webrtcvad.Vad()

# Set aggressiveness mode (0-3, 0 is least aggressive, 3 is most aggressive)
vad.set_mode(1)

# Define audio parameters
sample_rate = 16000 # Must be 8000, 16000, 32000, or 48000 Hz
frame_duration_ms = 10 # Must be 10, 20, or 30 ms

# Generate a silent frame (16-bit mono PCM)
frame_size_bytes = int(sample_rate * frame_duration_ms / 1000) * 2 # 2 bytes per 16-bit sample
silent_frame = b'\x00\x00' * (frame_size_bytes // 2)

# Check if the frame contains speech
is_speech = vad.is_speech(silent_frame, sample_rate)
print(f"Contains speech: {is_speech}")

# Example with a slightly more aggressive mode
vad_aggressive = webrtcvad.Vad(3)
is_speech_aggressive = vad_aggressive.is_speech(silent_frame, sample_rate)
print(f"Contains speech (aggressive mode): {is_speech_aggressive}")

view raw JSON →