PipeCat AI

0.0.108 · active · verified Mon Apr 13

PipeCat AI is an open-source framework designed for building real-time voice and multimodal AI assistants. It provides a modular pipeline architecture for integrating various services like Speech-to-Text (STT), Large Language Models (LLM), Text-to-Speech (TTS), and real-time transports (e.g., Daily.co). It's currently in pre-1.0 development, with frequent updates introducing new features and services.

Warnings

Install

Imports

Quickstart

This quickstart sets up a basic voice AI assistant using Daily.co for real-time communication, OpenAI's GPT-4o for language understanding, and OpenAI's TTS-1 for speech synthesis. It demonstrates the core pipeline concept: user speech input is processed by an LLM, the LLM's text response is converted to speech, and then output to the user.

import asyncio
import os

from pipecat.frames.frames import AudioFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.services.vad import VADService
from pipecat.transports.services.daily import DailyParams, DailyTransport, DailyTransportOptions
from pipecat.services.llm import LLMService
from pipecat.services.tts import TTSService

async def main():
    # Make sure to set environment variables for DAILY_URL and OPENAI_API_KEY
    # e.g., export DAILY_URL="https://example.daily.co/YOUR_ROOM" 
    #       export OPENAI_API_KEY="sk-proj-..."
    
    daily_url = os.environ.get("DAILY_URL", "")
    openai_api_key = os.environ.get("OPENAI_API_KEY", "")

    if not daily_url or not openai_api_key:
        print("Please set DAILY_URL and OPENAI_API_KEY environment variables.")
        return

    # Setup your services (Daily, VAD, LLM, TTS)
    transport = DailyTransport(
        daily_url,
        DailyTransportOptions(
            lang="en",
            vad_enabled=True,
            mic_enabled=True,
            speaker_enabled=True,
            vad_service=VADService(),
        ),
    )
    llm = LLMService(
        api_key=openai_api_key,
        model="gpt-4o",
    )
    tts = TTSService(
        api_key=openai_api_key,
        model="tts-1",
        voice="alloy",
    )

    # Define your pipeline: User audio -> LLM text -> TTS audio -> Bot audio
    pipeline = Pipeline([
        transport.input(),        # User input (audio) from Daily
        llm,                      # LLM processes user text
        tts,                      # TTS generates audio from LLM text
        transport.output(),       # Bot output (audio) to Daily
    ])

    runner = PipelineRunner()

    print("Starting PipeCat AI assistant. Join the Daily room specified by DAILY_URL.")
    await runner.run(pipeline)

if __name__ == "__main__":
    asyncio.run(main())

view raw JSON →