Llama Stack

raw JSON →
0.7.1 verified Fri May 01 auth: no python

Open-source, OpenAI-compatible API server with pluggable providers for any model and any infrastructure. Current version 0.7.1, requires Python >=3.12. Released under a rapid cadence (multiple minor versions per month).

pip install llama-stack
error ModuleNotFoundError: No module named 'llama_stack_client'
cause The client library is not installed as a dependency of the server package.
fix
pip install llama-stack-client
error AttributeError: module 'llama_stack' has no attribute '...'
cause Attempting to import from the server package when the symbol is in the client package or does not exist.
fix
Verify the correct import path: most client classes are under llama_stack_client, not llama_stack.
error llama_stack_client.api_error.ApiError: 404 Not Found - The requested endpoint does not exist.
cause Using an old API endpoint that was removed or renamed (e.g., fine_tuning).
fix
Check the changelog for the version you upgraded to and update to the new endpoint paths.
breaking In v0.7.0 the fine_tuning API was removed entirely. Any code using fine_tuning endpoints or client methods will break.
fix Remove fine_tuning API usage; use external training libraries if needed.
breaking In v0.6.0 numerous post-training API endpoints were renamed/restructured for consistency. Old endpoint paths no longer work.
fix Update API calls to match the new consistent naming scheme documented in the changelog.
deprecated The Agents API is deprecated in favor of the new Responses API (introduced v0.5.0). The Agents endpoint may be removed in a future release.
fix Migrate from Agents to Responses API as shown in the migration guide.
gotcha The llama-stack package and llama-stack-client are separate PyPI packages. Installing one does NOT install the other.
fix Install both with: pip install llama-stack llama-stack-client
pip install llama-stack[starter]

Initialize the Llama Stack client and run a basic chat completion. Requires the server to be running.

import os
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(
    base_url=os.environ.get("LLAMA_STACK_BASE_URL", "http://localhost:8321"),
    api_key=os.environ.get("LLAMA_STACK_API_KEY", "")
)

# List available models
models = client.models.list()
print([m.identifier for m in models])

# Send a chat completion
response = client.chat.completions.create(
    model_id="Meta-Llama-3.1-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)
print(response.choices[0].message.content)