LiteLLM
Unified Python SDK and proxy gateway for calling 100+ LLM APIs in OpenAI-compatible format. Single interface for OpenAI, Anthropic, Bedrock, VertexAI, Groq, Mistral, Cohere, HuggingFace, vLLM, and more. Supports /chat/completions, /embeddings, /images, /audio, /rerank, streaming, async, cost tracking, fallbacks, and load balancing. Also ships as a deployable proxy server (AI Gateway) with budget controls, virtual keys, and logging. Releases multiple times per week — version numbers are high (v1.81+) due to frequent patch releases. NOT related to the 'litellm' PyPI stub that existed before BerriAI claimed the name.
Common errors
-
ModuleNotFoundError: No module named 'litellm'
cause The `litellm` library has not been installed in your Python environment or there is an issue with your Python path.fixInstall the library using pip: `pip install litellm` or `pip install 'litellm[proxy]'` if you need the proxy server components. -
litellm.exceptions.AuthenticationError: AuthenticationError: OpenAIException - The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
cause The API key for the specified LLM provider (e.g., OpenAI, Anthropic, etc.) is missing or incorrectly configured, either as an environment variable or passed directly to the `completion` call.fixSet the API key as an environment variable (e.g., `export OPENAI_API_KEY="sk-..."` for OpenAI) or pass it directly to the `litellm.completion` function: `litellm.completion(model='gpt-3.5-turbo', messages=messages, api_key='sk-YOUR_API_KEY')`. -
litellm.BadRequestError: LLM Provider NOT provided. Pass in the LLM provider you are trying to call.
cause LiteLLM cannot infer the target LLM provider from the `model` string you provided, or the provider is not explicitly set.fixEnsure your model string explicitly includes the provider prefix (e.g., 'openai/gpt-3.5-turbo', 'anthropic/claude-2', 'google/gemini-pro') or set the `litellm.set_model_alias` if using custom mappings. -
litellm.BadRequestError: 'messages' is a required parameter
cause The `messages` parameter, which holds the conversation history for chat completions, is either missing from your `litellm.completion` call or is not a correctly formatted list of message dictionaries.fixEnsure `messages` is a list of dictionaries, where each dictionary has at least a 'role' (e.g., 'user', 'assistant', 'system') and 'content' field. Example: `litellm.completion(model='gpt-4', messages=[{'role': 'user', 'content': 'Hello!'}])`. -
litellm.BadRequestError: OpenAIException - Unexpected role 'user' after role 'tool' ... Invalid request parameters
cause This error occurs when the sequence of 'role' in your `messages` list does not follow the expected conversational turn-taking, especially after a 'tool' message. The model expects an 'assistant' response after a 'tool' output before a new 'user' message.fixEnsure that after a message with `role: 'tool'`, the next message has `role: 'assistant'` to acknowledge or act upon the tool's output, before another `role: 'user'` message is sent. The typical flow is `user` -> `assistant` -> `tool` -> `assistant` -> `user`.
Warnings
- breaking LiteLLM releases multiple times per week. Minor and patch versions introduce behavioral changes — response format normalization, new provider routing logic, cost calculation updates — without necessarily bumping the major version. Unpinned installs in production can silently change behavior overnight.
- breaking A known OOM (Out of Memory) issue on Kubernetes was introduced in a September 2025 release and caused proxy startup failures. The issue was patched in a subsequent release but affected users who were on that specific release without pinning.
- gotcha Model string format matters. 'gpt-4o' (no prefix) works but LiteLLM must infer the provider. 'openai/gpt-4o' (with prefix) is explicit and more reliable, especially when multiple providers are configured. Ambiguous model names can be silently routed to the wrong provider.
- gotcha pip install litellm pulls in a large dependency tree (openai, anthropic, httpx, pydantic, tiktoken, and more). Cold install in CI can take 2-4 minutes. Docker images are significantly faster for repeated deploys.
- gotcha Cost tracking (completion_cost(), token_counter()) depends on LiteLLM's internal pricing database. Prices for new or custom models may lag behind actual provider pricing by days to weeks. Do not rely on LiteLLM cost estimates for billing-critical applications without cross-referencing provider invoices.
- gotcha litellm[proxy] requires additional system dependencies (prisma CLI for DB migrations) that are not installed automatically. Running litellm --use_prisma_migrate without prisma installed raises a confusing error.
- breaking Installing `litellm[proxy]` in minimal environments like Alpine can fail if the Rust `cargo` build toolchain is not pre-installed. A dependency, `pyroscope-io`, requires `cargo` to build its wheel, leading to an `[Errno 2] No such file or directory: 'cargo'` error.
Install
-
pip install litellm -
pip install 'litellm[proxy]' -
pip install 'litellm[caching]' -
litellm --model gpt-4o
Imports
- completion
import litellm; litellm.Completion()
from litellm import completion
- acompletion
from litellm import acompletion
Quickstart
import os
from litellm import completion
# Set provider API keys as env vars
os.environ["OPENAI_API_KEY"] = "..."
os.environ["ANTHROPIC_API_KEY"] = "..."
# OpenAI
response = completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
# Anthropic — same interface, different model string
response = completion(
model="anthropic/claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello!"}]
)
# Streaming
for chunk in completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Count to 5"}],
stream=True
):
print(chunk.choices[0].delta.content or "", end="")
# Async
import asyncio
from litellm import acompletion
async def main():
response = await acompletion(
model="anthropic/claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
asyncio.run(main())
# Cost tracking
from litellm import completion_cost
cost = completion_cost(completion_response=response)
print(f"Cost: ${cost}")