Llama Stack

0.7.1 verified Fri May 01 auth: no python

Open-source, OpenAI-compatible API server with pluggable providers for any model and any infrastructure. Current version 0.7.1, requires Python >=3.12. Released under a rapid cadence (multiple minor versions per month).

pip install llama-stack

Common errors

error ModuleNotFoundError: No module named 'llama_stack_client' ↓

cause The client library is not installed as a dependency of the server package.

fix

pip install llama-stack-client

error AttributeError: module 'llama_stack' has no attribute '...' ↓

cause Attempting to import from the server package when the symbol is in the client package or does not exist.

fix

Verify the correct import path: most client classes are under llama_stack_client, not llama_stack.

error llama_stack_client.api_error.ApiError: 404 Not Found - The requested endpoint does not exist. ↓

cause Using an old API endpoint that was removed or renamed (e.g., fine_tuning).

fix

Check the changelog for the version you upgraded to and update to the new endpoint paths.

Warnings

breaking In v0.7.0 the fine_tuning API was removed entirely. Any code using fine_tuning endpoints or client methods will break. ↓

fix Remove fine_tuning API usage; use external training libraries if needed.

breaking In v0.6.0 numerous post-training API endpoints were renamed/restructured for consistency. Old endpoint paths no longer work. ↓

fix Update API calls to match the new consistent naming scheme documented in the changelog.

deprecated The Agents API is deprecated in favor of the new Responses API (introduced v0.5.0). The Agents endpoint may be removed in a future release. ↓

fix Migrate from Agents to Responses API as shown in the migration guide.

gotcha The llama-stack package and llama-stack-client are separate PyPI packages. Installing one does NOT install the other. ↓

fix Install both with: pip install llama-stack llama-stack-client

Install

pip install llama-stack[starter]

Imports

LlamaStackClient
wrong
```
from llama_stack import LlamaStackClient
```
correct
```
from llama_stack_client import LlamaStackClient
```
The client is a separate package; importing from the server package will fail.
Stack
wrong
```
from llama_stack import Stack
```
correct
```
from llama_stack import LlamaStackAsLibraryClient
```
No 'Stack' class exists; use LlamaStackAsLibraryClient for programmatic usage.

Quickstart

Initialize the Llama Stack client and run a basic chat completion. Requires the server to be running.

import os
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(
    base_url=os.environ.get("LLAMA_STACK_BASE_URL", "http://localhost:8321"),
    api_key=os.environ.get("LLAMA_STACK_API_KEY", "")
)

# List available models
models = client.models.list()
print([m.identifier for m in models])

# Send a chat completion
response = client.chat.completions.create(
    model_id="Meta-Llama-3.1-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)
print(response.choices[0].message.content)