LiveKit Agents AWS Plugin
livekit-plugins-aws is a plugin for LiveKit Agents, providing seamless integrations with Amazon Web Services for real-time voice applications. It enables the use of AWS services such as Amazon Polly for Text-to-Speech (TTS) and Amazon Transcribe for Speech-to-Text (STT) within the LiveKit Agent framework. This allows developers to build sophisticated voice AI agents leveraging AWS backend services. The library is currently at version 1.5.4 and is part of the `livekit/agents` monorepo, which features an active development and release cadence.
Common errors
-
ModuleNotFoundError: No module named 'boto3'
cause The `boto3` AWS SDK for Python, which `livekit-plugins-aws` depends on, is not installed in your environment.fixInstall the AWS plugin using `pip install livekit-plugins-aws` or by installing `livekit-agents` with the `aws` extra: `pip install livekit-agents[aws]`. -
botocore.exceptions.ClientError: An error occurred (InvalidClientTokenId) when calling the GetCallerIdentity operation: The security token included in the request is invalid.
cause Your AWS credentials (access key, secret key, or session token) are incorrect, missing, or have expired. This prevents authentication with AWS services.fixVerify your `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, and `AWS_REGION` environment variables. Ensure they are correct and your IAM user/role has the necessary permissions. Also check `~/.aws/credentials` if using a profile. -
Agent responds too quickly/prematurely, or costs for TTS/STT are higher than expected after upgrade.
cause LiveKit Agents versions 1.5.0+ enable 'preemptive generation' by default, meaning LLM and TTS inference may start before the user has finished speaking, increasing concurrency and potentially costs.fixIf this behavior is not desired, you can disable it by passing `preemptive_generation=False` to your `ctx.get_agent_session()` call. Example: `session = ctx.get_agent_session(..., preemptive_generation=False)`.
Warnings
- breaking LiveKit Agents version 1.5.0 and above enable 'preemptive generation' by default. This changes when LLM/TTS inference starts, potentially impacting latency, cost, and conversational flow. This was refined further in 1.5.4.
- gotcha AWS credentials must be correctly configured for `livekit-plugins-aws` to access services like Polly and Transcribe. Common issues include missing environment variables or incorrect IAM permissions.
- gotcha The `livekit-plugins-aws` package is part of the `livekit/agents` monorepo. It tightly couples with `livekit-agents` and `boto3`. Version mismatches can lead to unexpected behavior or `ModuleNotFoundError`.
Install
-
pip install livekit-plugins-aws -
pip install livekit-agents[aws,openai]
Imports
- STT
from livekit.plugins import aws aws_stt = aws.STT()
- TTS
from livekit.plugins import aws aws_tts = aws.TTS()
- VAD
from livekit.plugins import aws aws_vad = aws.VAD()
Quickstart
import asyncio
import os
from livekit.agents import Agent, JobContext, WorkerOptions, cli
from livekit.agents.llm import OpenAI
from livekit.agents.voice import VoiceActivityDetector
from livekit.plugins import aws
# --- Environment Variables Needed ---
# LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION (e.g., us-east-1)
# OPENAI_API_KEY (if using OpenAI LLM)
# ----------------------------------
class MyAWSVoiceAgent(Agent):
def __init__(self):
super().__init__()
# AWS STT and TTS pick up credentials and region from
# environment variables or default AWS config (~/.aws/credentials).
self.aws_stt = aws.STT() # Uses AWS Transcribe
self.aws_tts = aws.TTS() # Uses AWS Polly
self.openai_llm = OpenAI() # Example: Using OpenAI for LLM
async def _on_connected(self, ctx: JobContext):
print(f"Agent connected to room: {ctx.room.name}")
# Initialize the agent session with AWS STT/TTS
session = ctx.get_agent_session(
llm=self.openai_llm,
tts=self.aws_tts,
stt=self.aws_stt,
vad=VoiceActivityDetector(), # Recommended for robust voice interaction
# preemptive_generation=False # Set to False if you want to disable the 1.5.0 default
)
await session.start()
print("Agent session started with AWS STT/TTS. Waiting for user input...")
async for turn in session.ai_turns():
if turn.text:
print(f"User (via AWS Transcribe): {turn.text}")
response = await self.openai_llm.generate_reply(turn.history)
# Agent speaks response via AWS Polly
await turn.say(response.text)
print(f"Agent (via AWS Polly): {response.text}")
print("Agent session ended.")
if __name__ == "__main__":
cli.run_agent(MyAWSVoiceAgent(), WorkerOptions(
log_level="INFO",
rtc_url=os.environ.get("LIVEKIT_URL", "ws://localhost:7880"),
webrtc_url=os.environ.get("LIVEKIT_WEBRTC_URL", "http://localhost:7880"),
api_key=os.environ.get("LIVEKIT_API_KEY", ""),
api_secret=os.environ.get("LIVEKIT_API_SECRET", ""),
))