LiveKit Agents AWS Plugin

1.5.4 · active · verified Fri Apr 17

livekit-plugins-aws is a plugin for LiveKit Agents, providing seamless integrations with Amazon Web Services for real-time voice applications. It enables the use of AWS services such as Amazon Polly for Text-to-Speech (TTS) and Amazon Transcribe for Speech-to-Text (STT) within the LiveKit Agent framework. This allows developers to build sophisticated voice AI agents leveraging AWS backend services. The library is currently at version 1.5.4 and is part of the `livekit/agents` monorepo, which features an active development and release cadence.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to set up a basic LiveKit Agent using AWS Transcribe for Speech-to-Text (STT) and AWS Polly for Text-to-Speech (TTS). It assumes you have LiveKit server credentials and AWS credentials configured via environment variables (e.g., `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`). The agent uses an OpenAI LLM (ensure `OPENAI_API_KEY` is set) to generate responses, which are then spoken back using AWS Polly. Run this with `python your_script_name.py` after setting up environment variables.

import asyncio
import os
from livekit.agents import Agent, JobContext, WorkerOptions, cli
from livekit.agents.llm import OpenAI
from livekit.agents.voice import VoiceActivityDetector
from livekit.plugins import aws

# --- Environment Variables Needed ---
# LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION (e.g., us-east-1)
# OPENAI_API_KEY (if using OpenAI LLM)
# ----------------------------------

class MyAWSVoiceAgent(Agent):
    def __init__(self):
        super().__init__()
        # AWS STT and TTS pick up credentials and region from
        # environment variables or default AWS config (~/.aws/credentials).
        self.aws_stt = aws.STT() # Uses AWS Transcribe
        self.aws_tts = aws.TTS() # Uses AWS Polly
        self.openai_llm = OpenAI() # Example: Using OpenAI for LLM

    async def _on_connected(self, ctx: JobContext):
        print(f"Agent connected to room: {ctx.room.name}")
        
        # Initialize the agent session with AWS STT/TTS
        session = ctx.get_agent_session(
            llm=self.openai_llm,
            tts=self.aws_tts,
            stt=self.aws_stt,
            vad=VoiceActivityDetector(), # Recommended for robust voice interaction
            # preemptive_generation=False # Set to False if you want to disable the 1.5.0 default
        )
        
        await session.start()
        print("Agent session started with AWS STT/TTS. Waiting for user input...")

        async for turn in session.ai_turns():
            if turn.text:
                print(f"User (via AWS Transcribe): {turn.text}")
                response = await self.openai_llm.generate_reply(turn.history)
                
                # Agent speaks response via AWS Polly
                await turn.say(response.text)
                print(f"Agent (via AWS Polly): {response.text}")

        print("Agent session ended.")

if __name__ == "__main__":
    cli.run_agent(MyAWSVoiceAgent(), WorkerOptions(
        log_level="INFO",
        rtc_url=os.environ.get("LIVEKIT_URL", "ws://localhost:7880"),
        webrtc_url=os.environ.get("LIVEKIT_WEBRTC_URL", "http://localhost:7880"),
        api_key=os.environ.get("LIVEKIT_API_KEY", ""),
        api_secret=os.environ.get("LIVEKIT_API_SECRET", ""),
    ))

view raw JSON →