Llama Stack Client Python Library
The official Python library for the Llama Stack API, providing convenient access to its REST API. It includes comprehensive type definitions for request parameters and response fields, and offers both synchronous and asynchronous clients. The library is generated using Stainless and is designed for Python 3.12+ applications. It is currently in active alpha development, with frequent releases.
Warnings
- breaking The `agents` API was renamed to `responses` API in `v0.7.0-alpha.1`. Code using `client.agents` will no longer work.
- breaking Breaking changes were introduced to the `GET /chat/completions/{completion_id}` and `/files/{file_id}` endpoints in `v0.7.0-alpha.1` to eliminate conformance issues.
- breaking Consistency improvements were made to post-training API endpoints in `v0.6.1-alpha.1`, which may involve API surface changes.
- gotcha The `llama-stack-client` library is currently in alpha (`--pre`) release status. This implies that breaking changes and API instability are frequent and expected across minor versions. It is recommended to pin exact versions.
- gotcha This library is a client for the Llama Stack API. It requires a separate Llama Stack server instance to be running and accessible. This client library does not include the server itself.
- gotcha For authentication, the client primarily relies on the `LLAMA_STACK_CLIENT_API_KEY` environment variable. If this is not set, API calls may fail or use a dummy key.
- gotcha The Responses API, a central feature for server-side agentic orchestration, is still under active development. While usable, some parts of its OpenAI-compatible implementation may still be unimplemented.
Install
-
pip install --pre llama-stack-client
Imports
- LlamaStackClient
from llama_stack_client import LlamaStackClient
Quickstart
import os
from llama_stack_client import LlamaStackClient
# Ensure a Llama Stack server is running, e.g., locally at http://localhost:8321.
# Authentication typically uses the LLAMA_STACK_CLIENT_API_KEY environment variable.
# Example: export LLAMA_STACK_CLIENT_API_KEY="your_api_key"
# You can also set the base URL via LLAMA_STACK_BASE_URL environment variable.
client = LlamaStackClient(
base_url=os.environ.get("LLAMA_STACK_BASE_URL", "http://localhost:8321"),
api_key=os.environ.get("LLAMA_STACK_CLIENT_API_KEY", "dummy_key_for_testing_if_not_set"),
)
try:
# List available models
models = client.models.list()
print("Available models:", [model.id for model in models.data])
# Perform simple inference using the Responses API
if models.data:
response = client.responses.create(
model=models.data[0].id, # Use the first available model
input="Write a haiku about coding.",
)
print("\nHaiku from Llama Stack:", response.output_text)
else:
print("\nNo models found on the Llama Stack server.")
except Exception as e:
print(f"An error occurred: {e}")
print("Please ensure your Llama Stack server is running and accessible (e.g., via Docker), and the API key is set correctly.")