LiteLLM

1.81.15 · active · verified Sat Feb 28

Unified Python SDK and proxy gateway for calling 100+ LLM APIs in OpenAI-compatible format. Single interface for OpenAI, Anthropic, Bedrock, VertexAI, Groq, Mistral, Cohere, HuggingFace, vLLM, and more. Supports /chat/completions, /embeddings, /images, /audio, /rerank, streaming, async, cost tracking, fallbacks, and load balancing. Also ships as a deployable proxy server (AI Gateway) with budget controls, virtual keys, and logging. Releases multiple times per week — version numbers are high (v1.81+) due to frequent patch releases. NOT related to the 'litellm' PyPI stub that existed before BerriAI claimed the name.

Warnings

Install

Imports

Quickstart

Model strings use 'provider/model-name' format (e.g. 'anthropic/claude-sonnet-4-20250514', 'openai/gpt-4o'). Without the provider prefix, LiteLLM infers the provider — but explicit prefixes are safer. API keys are read from environment variables named per-provider.

import os
from litellm import completion

# Set provider API keys as env vars
os.environ["OPENAI_API_KEY"] = "..."
os.environ["ANTHROPIC_API_KEY"] = "..."

# OpenAI
response = completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

# Anthropic — same interface, different model string
response = completion(
    model="anthropic/claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Streaming
for chunk in completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Count to 5"}],
    stream=True
):
    print(chunk.choices[0].delta.content or "", end="")

# Async
import asyncio
from litellm import acompletion

async def main():
    response = await acompletion(
        model="anthropic/claude-sonnet-4-20250514",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)

asyncio.run(main())

# Cost tracking
from litellm import completion_cost
cost = completion_cost(completion_response=response)
print(f"Cost: ${cost}")

view raw JSON →