LiveKit Turn Detector Plugin
livekit-plugins-turn-detector provides end-of-utterance detection for LiveKit Agents, leveraging machine learning to differentiate between genuine interruptions and incidental background noises. It is an integral part of LiveKit Agents' adaptive interruption handling introduced in v1.5.0. The current version is 1.5.2, and it typically releases in conjunction with major livekit-agents updates.
Warnings
- gotcha Starting with `livekit-agents` 1.5.0, adaptive interruption handling, powered by this plugin, is enabled by default. This significantly changes the default VAD (Voice Activity Detection) behavior and might override custom VAD configurations if not explicitly managed. Users upgrading from older `livekit-agents` versions should be aware of this behavioral shift.
- gotcha The plugin has significant dependencies, notably `transformers[torch]`. This leads to a large installation size and introduces `torch` as a dependency, which can have performance implications and specific hardware requirements (e.g., GPU for faster inference).
- gotcha Versions of `livekit-plugins-turn-detector` prior to 1.5.1 had a stricter upper bound on the `transformers` dependency. This could lead to dependency conflicts if other libraries in your project required a newer or different `transformers` version.
- gotcha This plugin is specifically designed to work within the LiveKit Agents ecosystem. While the underlying ML model might be general-purpose, the `TurnDetector` class and its event handling are tightly integrated with LiveKit's audio stream processing and agent lifecycle.
Install
-
pip install livekit-plugins-turn-detector
Imports
- TurnDetector
from livekit.plugins.turn_detector import TurnDetector
- TurnStarted
from livekit.plugins.turn_detector import TurnStarted
- TurnFinished
from livekit.plugins.turn_detector import TurnFinished
- AudioFrame
from livekit.agents.utils import AudioFrame
Quickstart
import asyncio
import numpy as np
from livekit.agents.utils import AudioFrame
from livekit.plugins.turn_detector import TurnDetector, TurnStarted, TurnFinished
async def quickstart_turn_detector():
print("Initializing TurnDetector...")
# TurnDetector.create() is an async factory method
detector = await TurnDetector.create()
# Simulate an audio stream (e.g., 16kHz mono audio)
sample_rate = 16000
num_silent_frames = 50 # 500ms of silence (50 * 10ms frames)
num_speech_frames = 100 # 1 second of speech
frame_size = int(sample_rate * 0.01) # 10ms frame
async def simulate_audio():
# Silence
for _ in range(num_silent_frames):
frame = AudioFrame(np.zeros(frame_size, dtype=np.int16), sample_rate, 1)
await detector.push_frame(frame)
await asyncio.sleep(0.01) # Simulate real-time
print("Simulated silence.")
# Speech (simulated non-zero audio)
for i in range(num_speech_frames):
t = np.linspace(0, 0.01, frame_size, endpoint=False)
sine_wave = (np.sin(2 * np.pi * 440 * t) * 1000).astype(np.int16)
frame = AudioFrame(sine_wave, sample_rate, 1)
await detector.push_frame(frame)
if i == 0:
print("Simulating speech...")
await asyncio.sleep(0.01)
# Post-speech silence
for _ in range(num_silent_frames):
frame = AudioFrame(np.zeros(frame_size, dtype=np.int16), sample_rate, 1)
await detector.push_frame(frame)
await asyncio.sleep(0.01)
print("Simulated post-speech silence. Closing detector.")
# Signal end of stream
await detector.flush()
# Process events from the detector
async def process_events():
async for event in detector.detect_turns():
if isinstance(event, TurnStarted):
print(f"Turn Started at timestamp {event.timestamp}")
elif isinstance(event, TurnFinished):
print(f"Turn Finished at timestamp {event.timestamp}, duration: {event.duration}s")
# Run both concurrently
await asyncio.gather(simulate_audio(), process_events())
print("Quickstart finished.")
if __name__ == "__main__":
asyncio.run(quickstart_turn_detector())