pyvad

0.2.0 · active · verified Thu Apr 16

Pyvad is a Python wrapper for the `py-webrtcvad` library, designed for trimming speech clips from audio. It provides a simplified interface for Voice Activity Detection (VAD) functionality, allowing users to identify and extract voiced segments from audio data. The current version is 0.2.0, released in July 2022, with an infrequent release cadence.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `pyvad` to perform Voice Activity Detection and trim silence from a simulated audio clip. It generates a sample audio array with speech and silence segments, then applies the `vad` and `trim` functions. The `vad` function returns an array indicating voiced/unvoiced segments, and `trim` returns the audio with leading/trailing silence removed, along with the start and end indices of the speech.

import numpy as np
from pyvad import vad, trim

# Simulate audio data (e.g., 1 second of speech, 1 second of silence)
fs = 16000 # Sample rate in Hz (WebRTC VAD supported rate)
duration_speech = 1.0 # seconds
duration_silence = 1.0 # seconds

# Generate a simple sine wave for 'speech'
t = np.linspace(0, duration_speech, int(fs * duration_speech), endpoint=False)
speech_data = 0.5 * np.sin(2 * np.pi * 440 * t) # 440 Hz sine wave

# Generate silence
silence_data = np.zeros(int(fs * duration_silence))

# Combine speech and silence
audio_data = np.concatenate((silence_data, speech_data, silence_data)).astype(np.float32)

print(f"Audio data shape: {audio_data.shape}, Sample rate: {fs} Hz")

# Perform Voice Activity Detection
vact = vad(audio_data, fs)
print(f"Voice activity array shape: {vact.shape}")
# vact will contain 1s for voiced segments, 0s for unvoiced

# Trim silence from the audio
trimmed_audio, (start_idx, end_idx) = trim(audio_data, fs)
print(f"Trimmed audio shape: {trimmed_audio.shape}")
print(f"Original audio length: {len(audio_data) / fs:.2f}s")
print(f"Trimmed audio from {start_idx/fs:.2f}s to {end_idx/fs:.2f}s, total {len(trimmed_audio)/fs:.2f}s")

view raw JSON →