Llama Stack
raw JSON → 0.7.1 verified Fri May 01 auth: no python
Open-source, OpenAI-compatible API server with pluggable providers for any model and any infrastructure. Current version 0.7.1, requires Python >=3.12. Released under a rapid cadence (multiple minor versions per month).
pip install llama-stack Common errors
error ModuleNotFoundError: No module named 'llama_stack_client' ↓
cause The client library is not installed as a dependency of the server package.
fix
pip install llama-stack-client
error AttributeError: module 'llama_stack' has no attribute '...' ↓
cause Attempting to import from the server package when the symbol is in the client package or does not exist.
fix
Verify the correct import path: most client classes are under llama_stack_client, not llama_stack.
error llama_stack_client.api_error.ApiError: 404 Not Found - The requested endpoint does not exist. ↓
cause Using an old API endpoint that was removed or renamed (e.g., fine_tuning).
fix
Check the changelog for the version you upgraded to and update to the new endpoint paths.
Warnings
breaking In v0.7.0 the fine_tuning API was removed entirely. Any code using fine_tuning endpoints or client methods will break. ↓
fix Remove fine_tuning API usage; use external training libraries if needed.
breaking In v0.6.0 numerous post-training API endpoints were renamed/restructured for consistency. Old endpoint paths no longer work. ↓
fix Update API calls to match the new consistent naming scheme documented in the changelog.
deprecated The Agents API is deprecated in favor of the new Responses API (introduced v0.5.0). The Agents endpoint may be removed in a future release. ↓
fix Migrate from Agents to Responses API as shown in the migration guide.
gotcha The llama-stack package and llama-stack-client are separate PyPI packages. Installing one does NOT install the other. ↓
fix Install both with: pip install llama-stack llama-stack-client
Install
pip install llama-stack[starter] Imports
- LlamaStackClient wrong
from llama_stack import LlamaStackClientcorrectfrom llama_stack_client import LlamaStackClient - Stack wrong
from llama_stack import Stackcorrectfrom llama_stack import LlamaStackAsLibraryClient
Quickstart
import os
from llama_stack_client import LlamaStackClient
client = LlamaStackClient(
base_url=os.environ.get("LLAMA_STACK_BASE_URL", "http://localhost:8321"),
api_key=os.environ.get("LLAMA_STACK_API_KEY", "")
)
# List available models
models = client.models.list()
print([m.identifier for m in models])
# Send a chat completion
response = client.chat.completions.create(
model_id="Meta-Llama-3.1-8B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)