LiveKit Agents Plugin for Cartesia
livekit-plugins-cartesia is a Python plugin for LiveKit Agents, providing Text-to-Speech (TTS) and Speech-to-Text (STT) capabilities using Cartesia's AI services. It allows developers to connect directly to Cartesia's API with their own API key, offering an alternative to LiveKit Inference for managing billing and enabling custom Cartesia voices. This library is crucial for building real-time, conversational AI applications that require Cartesia's advanced voice synthesis and transcription.
Warnings
- breaking LiveKit Agents 1.5.0 introduced significant changes to the `TurnHandlingOptions` API. Old keyword arguments for endpointing and interruption (e.g., `min_endpointing_delay`, `allow_interruptions`) are deprecated and will be removed in future versions. Agents should be updated to use the new dictionary-based `turn_handling` parameter.
- gotcha The `livekit-plugins-cartesia` plugin requires a Cartesia API key. This key must be provided explicitly to the `TTS` and `STT` constructors or set as the `CARTESIA_API_KEY` environment variable. Without it, Cartesia services will fail to authenticate.
- gotcha With livekit-agents 1.4.4, the default Cartesia TTS model was upgraded to 'Sonic 3'. If your application previously relied on an older default Cartesia model without explicitly specifying it, the voice output might change unexpectedly after upgrading LiveKit Agents (and by extension, this plugin's underlying dependencies).
Install
-
pip install livekit-plugins-cartesia
Imports
- cartesia
from livekit.plugins import cartesia
- TTS
from livekit.plugins.cartesia import TTS
- STT
from livekit.plugins.cartesia import STT
Quickstart
import os
import asyncio
from livekit.agents import AgentSession, JobContext, WorkerOptions, cli
from livekit.plugins import cartesia
class CartesiaVoiceAgent:
@cli.agent_handler("voice")
async def agent_handle(self, ctx: JobContext):
session = AgentSession(ctx)
await session.start(
tts=cartesia.TTS(api_key=os.environ.get('CARTESIA_API_KEY', '')),
stt=cartesia.STT(api_key=os.environ.get('CARTESIA_API_KEY', '')),
# Other agent components like LLM, VAD, etc., would be configured here
)
print("Agent started. Listening for speech...")
# Example: Say something to the user
await session.say("Hello! I am a Cartesia-powered voice agent. How can I help you today?")
# In a real agent, you would have a loop to process user input (STT) and generate responses (TTS)
# For demonstration, we'll just keep the session alive briefly.
await asyncio.sleep(60)
if __name__ == "__main__":
# Ensure environment variables are set for LiveKit and Cartesia
# For LiveKit: LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET
# For Cartesia: CARTESIA_API_KEY
# Example of how to run the agent (usually via `livekit-cli run`)
# For local testing, you might need to set up a mock or local LiveKit server
# This quickstart is meant to illustrate the plugin usage, not a full deployment.
# Set dummy values if env vars are missing for local testing to avoid immediate errors
os.environ.setdefault('LIVEKIT_URL', 'wss://your-livekit-server.cloud')
os.environ.setdefault('LIVEKIT_API_KEY', 'SK_YOUR_LIVEKIT_API_KEY')
os.environ.setdefault('LIVEKIT_API_SECRET', 'YOUR_LIVEKIT_API_SECRET')
os.environ.setdefault('CARTESIA_API_KEY', 'YOUR_CARTESIA_API_KEY') # Replace with your actual key
cli.run(WorkerOptions(agent_handles=[CartesiaVoiceAgent().agent_handle]))