LiteLLM
Unified Python SDK and proxy gateway for calling 100+ LLM APIs in OpenAI-compatible format. Single interface for OpenAI, Anthropic, Bedrock, VertexAI, Groq, Mistral, Cohere, HuggingFace, vLLM, and more. Supports /chat/completions, /embeddings, /images, /audio, /rerank, streaming, async, cost tracking, fallbacks, and load balancing. Also ships as a deployable proxy server (AI Gateway) with budget controls, virtual keys, and logging. Releases multiple times per week — version numbers are high (v1.81+) due to frequent patch releases. NOT related to the 'litellm' PyPI stub that existed before BerriAI claimed the name.
Warnings
- breaking LiteLLM releases multiple times per week. Minor and patch versions introduce behavioral changes — response format normalization, new provider routing logic, cost calculation updates — without necessarily bumping the major version. Unpinned installs in production can silently change behavior overnight.
- breaking A known OOM (Out of Memory) issue on Kubernetes was introduced in a September 2025 release and caused proxy startup failures. The issue was patched in a subsequent release but affected users who were on that specific release without pinning.
- gotcha Model string format matters. 'gpt-4o' (no prefix) works but LiteLLM must infer the provider. 'openai/gpt-4o' (with prefix) is explicit and more reliable, especially when multiple providers are configured. Ambiguous model names can be silently routed to the wrong provider.
- gotcha pip install litellm pulls in a large dependency tree (openai, anthropic, httpx, pydantic, tiktoken, and more). Cold install in CI can take 2-4 minutes. Docker images are significantly faster for repeated deploys.
- gotcha Cost tracking (completion_cost(), token_counter()) depends on LiteLLM's internal pricing database. Prices for new or custom models may lag behind actual provider pricing by days to weeks. Do not rely on LiteLLM cost estimates for billing-critical applications without cross-referencing provider invoices.
- gotcha litellm[proxy] requires additional system dependencies (prisma CLI for DB migrations) that are not installed automatically. Running litellm --use_prisma_migrate without prisma installed raises a confusing error.
Install
-
pip install litellm -
pip install 'litellm[proxy]' -
pip install 'litellm[caching]' -
litellm --model gpt-4o
Imports
- completion
from litellm import completion
- acompletion
from litellm import acompletion
Quickstart
import os
from litellm import completion
# Set provider API keys as env vars
os.environ["OPENAI_API_KEY"] = "..."
os.environ["ANTHROPIC_API_KEY"] = "..."
# OpenAI
response = completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
# Anthropic — same interface, different model string
response = completion(
model="anthropic/claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello!"}]
)
# Streaming
for chunk in completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Count to 5"}],
stream=True
):
print(chunk.choices[0].delta.content or "", end="")
# Async
import asyncio
from litellm import acompletion
async def main():
response = await acompletion(
model="anthropic/claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
asyncio.run(main())
# Cost tracking
from litellm import completion_cost
cost = completion_cost(completion_response=response)
print(f"Cost: ${cost}")