LiteLLM
raw JSON → 1.81.15 verified Tue May 12 auth: no python install: verified quickstart: verified
Unified Python SDK and proxy gateway for calling 100+ LLM APIs in OpenAI-compatible format. Single interface for OpenAI, Anthropic, Bedrock, VertexAI, Groq, Mistral, Cohere, HuggingFace, vLLM, and more. Supports /chat/completions, /embeddings, /images, /audio, /rerank, streaming, async, cost tracking, fallbacks, and load balancing. Also ships as a deployable proxy server (AI Gateway) with budget controls, virtual keys, and logging. Releases multiple times per week — version numbers are high (v1.81+) due to frequent patch releases. NOT related to the 'litellm' PyPI stub that existed before BerriAI claimed the name.
pip install litellm Common errors
error ModuleNotFoundError: No module named 'litellm' ↓
cause The `litellm` library has not been installed in your Python environment or there is an issue with your Python path.
fix
Install the library using pip:
pip install litellm or pip install 'litellm[proxy]' if you need the proxy server components. error litellm.exceptions.AuthenticationError: AuthenticationError: OpenAIException - The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable ↓
cause The API key for the specified LLM provider (e.g., OpenAI, Anthropic, etc.) is missing or incorrectly configured, either as an environment variable or passed directly to the `completion` call.
fix
Set the API key as an environment variable (e.g.,
export OPENAI_API_KEY="sk-..." for OpenAI) or pass it directly to the litellm.completion function: litellm.completion(model='gpt-3.5-turbo', messages=messages, api_key='sk-YOUR_API_KEY'). error litellm.BadRequestError: LLM Provider NOT provided. Pass in the LLM provider you are trying to call. ↓
cause LiteLLM cannot infer the target LLM provider from the `model` string you provided, or the provider is not explicitly set.
fix
Ensure your model string explicitly includes the provider prefix (e.g., 'openai/gpt-3.5-turbo', 'anthropic/claude-2', 'google/gemini-pro') or set the
litellm.set_model_alias if using custom mappings. error litellm.BadRequestError: 'messages' is a required parameter ↓
cause The `messages` parameter, which holds the conversation history for chat completions, is either missing from your `litellm.completion` call or is not a correctly formatted list of message dictionaries.
fix
Ensure
messages is a list of dictionaries, where each dictionary has at least a 'role' (e.g., 'user', 'assistant', 'system') and 'content' field. Example: litellm.completion(model='gpt-4', messages=[{'role': 'user', 'content': 'Hello!'}]). error litellm.BadRequestError: OpenAIException - Unexpected role 'user' after role 'tool' ... Invalid request parameters ↓
cause This error occurs when the sequence of 'role' in your `messages` list does not follow the expected conversational turn-taking, especially after a 'tool' message. The model expects an 'assistant' response after a 'tool' output before a new 'user' message.
fix
Ensure that after a message with
role: 'tool', the next message has role: 'assistant' to acknowledge or act upon the tool's output, before another role: 'user' message is sent. The typical flow is user -> assistant -> tool -> assistant -> user. Warnings
breaking LiteLLM releases multiple times per week. Minor and patch versions introduce behavioral changes — response format normalization, new provider routing logic, cost calculation updates — without necessarily bumping the major version. Unpinned installs in production can silently change behavior overnight. ↓
fix Pin to a specific version in production: pip install litellm==1.81.15. Use -stable tagged Docker images for the proxy.
breaking A known OOM (Out of Memory) issue on Kubernetes was introduced in a September 2025 release and caused proxy startup failures. The issue was patched in a subsequent release but affected users who were on that specific release without pinning. ↓
fix Always pin litellm versions. Check the release notes at docs.litellm.ai/release_notes before upgrading in production proxy deployments.
gotcha Model string format matters. 'gpt-4o' (no prefix) works but LiteLLM must infer the provider. 'openai/gpt-4o' (with prefix) is explicit and more reliable, especially when multiple providers are configured. Ambiguous model names can be silently routed to the wrong provider. ↓
fix Always use 'provider/model-name' format: 'anthropic/claude-sonnet-4-20250514', 'openai/gpt-4o', 'bedrock/amazon.titan-embed-text-v1'.
gotcha pip install litellm pulls in a large dependency tree (openai, anthropic, httpx, pydantic, tiktoken, and more). Cold install in CI can take 2-4 minutes. Docker images are significantly faster for repeated deploys. ↓
fix Use the official BerriAI Docker image for proxy deployments. For SDK use in CI, cache pip dependencies between runs.
gotcha Cost tracking (completion_cost(), token_counter()) depends on LiteLLM's internal pricing database. Prices for new or custom models may lag behind actual provider pricing by days to weeks. Do not rely on LiteLLM cost estimates for billing-critical applications without cross-referencing provider invoices. ↓
fix Treat LiteLLM cost estimates as approximations. For billing-critical use, validate against provider usage dashboards monthly.
gotcha litellm[proxy] requires additional system dependencies (prisma CLI for DB migrations) that are not installed automatically. Running litellm --use_prisma_migrate without prisma installed raises a confusing error. ↓
fix Install litellm-proxy-extras separately: pip install litellm-proxy-extras. Or use the Docker image which has all dependencies pre-installed.
breaking Installing `litellm[proxy]` in minimal environments like Alpine can fail if the Rust `cargo` build toolchain is not pre-installed. A dependency, `pyroscope-io`, requires `cargo` to build its wheel, leading to an `[Errno 2] No such file or directory: 'cargo'` error. ↓
fix Before installing `litellm[proxy]`, ensure the Rust `cargo` toolchain is available in your environment. For Alpine, install it using `apk add rust cargo`. For other systems, refer to Rust installation guides. Alternatively, use the official BerriAI Docker images which come with all necessary build dependencies pre-installed.
Install
pip install 'litellm[proxy]' pip install 'litellm[caching]' litellm --model gpt-4o Install compatibility verified last tested: 2026-05-12
python os / libc variant status wheel install import disk
3.10 alpine (musl) caching - - 7.80s 211.3M
3.10 alpine (musl) proxy - - - -
3.10 alpine (musl) litellm - - 7.74s 210.9M
3.10 slim (glibc) caching - - 5.63s 194M
3.10 slim (glibc) proxy - - 6.80s 474M
3.10 slim (glibc) litellm - - 5.63s 194M
3.11 alpine (musl) caching - - 10.07s 228.0M
3.11 alpine (musl) proxy - - - -
3.11 alpine (musl) litellm - - 10.02s 227.6M
3.11 slim (glibc) caching - - 8.45s 211M
3.11 slim (glibc) proxy - - 10.33s 499M
3.11 slim (glibc) litellm - - 8.46s 211M
3.12 alpine (musl) caching - - 8.13s 216.5M
3.12 alpine (musl) proxy - - - -
3.12 alpine (musl) litellm - - 7.93s 216.1M
3.12 slim (glibc) caching - - 8.39s 200M
3.12 slim (glibc) proxy - - 9.78s 489M
3.12 slim (glibc) litellm - - 8.04s 199M
3.13 alpine (musl) caching - - 7.58s 216.3M
3.13 alpine (musl) proxy - - - -
3.13 alpine (musl) litellm - - 7.68s 215.9M
3.13 slim (glibc) caching - - 7.74s 199M
3.13 slim (glibc) proxy - - 9.54s 488M
3.13 slim (glibc) litellm - - 7.80s 199M
3.9 alpine (musl) caching - - 7.21s 210.6M
3.9 alpine (musl) proxy - - - -
3.9 alpine (musl) litellm - - 7.29s 210.2M
3.9 slim (glibc) caching - - 6.40s 194M
3.9 slim (glibc) proxy - - 7.79s 295M
3.9 slim (glibc) litellm - - 6.44s 193M
Imports
- completion wrong
import litellm; litellm.Completion()correctfrom litellm import completion - acompletion
from litellm import acompletion
Quickstart verified last tested: 2026-05-12
import os
from litellm import completion
# Set provider API keys as env vars
os.environ["OPENAI_API_KEY"] = "..."
os.environ["ANTHROPIC_API_KEY"] = "..."
# OpenAI
response = completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
# Anthropic — same interface, different model string
response = completion(
model="anthropic/claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello!"}]
)
# Streaming
for chunk in completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Count to 5"}],
stream=True
):
print(chunk.choices[0].delta.content or "", end="")
# Async
import asyncio
from litellm import acompletion
async def main():
response = await acompletion(
model="anthropic/claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
asyncio.run(main())
# Cost tracking
from litellm import completion_cost
cost = completion_cost(completion_response=response)
print(f"Cost: ${cost}")