LiveKit Silero Plugin
The livekit-plugins-silero library provides a Voice Activity Detection (VAD) plugin for the LiveKit Agent Framework. It leverages the Silero VAD model to accurately detect speech versus silence, which is crucial for natural turn-taking in voice AI applications and for optimizing Speech-to-Text (STT) resource usage. The current version is 1.5.2, released as part of the LiveKit Agents framework, which follows a rapid release cadence.
Warnings
- breaking LiveKit Agents v1.5.0 introduced significant changes to how turn handling (including VAD settings) is configured. Old keyword arguments like `min_endpointing_delay` and `allow_interruptions` in `AgentSession` are deprecated and will be removed in v2.0. Users should migrate to the new `TurnHandlingOptions` dictionary.
- gotcha The Silero VAD model weights are not bundled with the package and must be downloaded separately before the first use. Failure to do so will result in runtime errors.
- gotcha Loading the Silero VAD model via `silero.VAD.load()` is a blocking operation. Calling it directly within each agent session's entrypoint can lead to slow startup times for new jobs.
- gotcha By default, Silero VAD runs on the CPU. While it supports GPU acceleration (e.g., with `onnxruntime-gpu`), explicitly setting `force_cpu=False` during loading and ensuring the correct GPU environment is configured is necessary. Simply installing `onnxruntime-gpu` might not be sufficient to guarantee GPU utilization.
Install
-
pip install livekit-plugins-silero -
pip install "livekit-agents[silero]" -
python -m livekit.agents.cli download-files
Imports
- silero
from livekit.plugins import silero
- VAD
from livekit.plugins.silero.vad import VAD
Quickstart
import asyncio
import os
from livekit.agents import AgentServer, AgentSession, JobContext, JobProcess, cli
from livekit.plugins import silero
# Ensure LiveKit credentials are set up as environment variables
# LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET
server = AgentServer()
def prewarm(proc: JobProcess):
# Load the VAD model once per process for faster job startup
print("Prewarming Silero VAD model...")
proc.userdata["vad"] = silero.VAD.load()
print("Silero VAD model prewarmed.")
server.setup_fnc = prewarm
@server.rtc_session(agent_name="my-silero-agent")
async def my_agent(ctx: JobContext):
print(f"Agent {ctx.agent_name} received job {ctx.job_id}")
await ctx.connect()
# Retrieve the prewarmed VAD instance
vad = ctx.proc.userdata["vad"]
# Example: Initializing AgentSession with Silero VAD
session = AgentSession(
ctx,
vad=vad,
# Other components like STT, TTS, LLM would go here
# stt=...,
# tts=...,
# llm=...,
)
print("Agent session started with Silero VAD.")
try:
await session.start()
# Keep the agent running, e.g., for a conversation loop
await asyncio.sleep(600) # Keep alive for 10 minutes
finally:
await session.end()
print("Agent session ended.")
if __name__ == "__main__":
# Important: Download model weights before first run:
# python -m livekit.agents.cli download-files
# Set dummy credentials for runnable quickstart if not in environment
os.environ.setdefault('LIVEKIT_URL', os.environ.get('LIVEKIT_URL', 'wss://your-livekit-server.livekit.cloud'))
os.environ.setdefault('LIVEKIT_API_KEY', os.environ.get('LIVEKIT_API_KEY', 'SK_XXXXX'))
os.setdefault('LIVEKIT_API_SECRET', os.environ.get('LIVEKIT_API_SECRET', 'YOUR_SECRET'))
cli.run_app(server)