LiveKit Deepgram Plugin
livekit-plugins-deepgram is an Agent Framework plugin that provides integrations for Deepgram's Speech-to-Text (STT) and Text-to-Speech (TTS) services within LiveKit agents. Currently at version 1.5.2, it is part of the actively developed LiveKit Agents ecosystem, which sees frequent updates and new features.
Warnings
- breaking LiveKit Agents v1.5.0 introduced a new `TurnHandlingOptions` API. Old keyword arguments like `min_endpointing_delay` and `allow_interruptions` in `AgentSession` are deprecated and will be removed in v2.0. This affects how turn detection and interruption handling are configured for agents using Deepgram STT.
- gotcha When using Deepgram plugins directly (not via LiveKit Inference), a Deepgram API key is required. This key must be provided either as an argument to the plugin constructor (e.g., `deepgram.STT(api_key='...')`) or, more commonly, by setting the `DEEPGRAM_API_KEY` environment variable. Failure to provide it will result in authentication errors.
- gotcha Specific combinations of Deepgram's `nova-3-general` model with certain non-English languages (e.g., Spanish, French) and parameters like `endpointing=false` and `vad_events=true` can lead to `WSServerHandshakeError: 400`. While this might be a Deepgram API-side issue, it manifests when using the LiveKit plugin.
- deprecated The `keywords` parameter for Deepgram STT is deprecated and should be replaced with `keyterm` for improved recognition accuracy, especially when using Nova-3 models. Using `keywords` may still work but is not recommended and might be removed in future versions.
- gotcha Even when using Deepgram Flux (e.g., `turn_detection='stt'`) for advanced turn detection, it's recommended to still include a Voice Activity Detection (VAD) plugin like Silero. Flux handles turn detection, but a separate VAD is crucial for responsive interruption handling, allowing the agent to detect when a user speaks over the agent's response.
Install
-
pip install livekit-plugins-deepgram
Imports
- deepgram
from livekit.plugins import deepgram
- STT
from livekit.plugins.deepgram import STT
- TTS
from livekit.plugins.deepgram import TTS
Quickstart
import os
import asyncio
from livekit.agents import llm, stt, tts, vad
from livekit.agents.voice_assistant import VoiceAssistant
from livekit.agents.utils import AudioStream
from livekit.plugins import deepgram, openai, silero
# Ensure Deepgram API key is set in environment variables or passed directly
os.environ['DEEPGRAM_API_KEY'] = os.environ.get('DEEPGRAM_API_KEY', 'your_deepgram_api_key_here')
os.environ['OPENAI_API_KEY'] = os.environ.get('OPENAI_API_KEY', 'your_openai_api_key_here')
async def main():
# Example using Deepgram for STT and TTS, and OpenAI for LLM
deepgram_stt = deepgram.STT(model='nova-2')
deepgram_tts = deepgram.TTS(model='aura-2-asteria-en')
openai_llm = openai.LLM(model='gpt-4o-mini')
silero_vad = silero.VAD.get_default_vad()
assistant = VoiceAssistant(
stt=deepgram_stt,
tts=deepgram_tts,
llm=openai_llm,
vad=silero_vad,
context_timeout=15, # seconds
interrupt_sensitivity=0.5,
)
print("VoiceAssistant initialized. You can now use deepgram_stt, deepgram_tts in your agent session.")
# In a real agent, you would integrate this into an AgentSession
# For example: AgentSession(llm=openai_llm, stt=deepgram_stt, tts=deepgram_tts, ...)
# Simulate text-to-speech
async for chunk in deepgram_tts.synthesize('Hello from LiveKit and Deepgram!'):
if chunk.type == AudioStream.Type.ELEMENT:
print(f"Received audio chunk: {len(chunk.data)} bytes")
if __name__ == '__main__':
asyncio.run(main())