Llama Stack Client Python Library

0.7.2 · active · verified Wed Apr 15

The official Python library for the Llama Stack API, providing convenient access to its REST API. It includes comprehensive type definitions for request parameters and response fields, and offers both synchronous and asynchronous clients. The library is generated using Stainless and is designed for Python 3.12+ applications. It is currently in active alpha development, with frequent releases.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the LlamaStackClient and perform a basic model listing and inference request. It assumes a Llama Stack server is already running and accessible, for instance, locally via Docker.

import os
from llama_stack_client import LlamaStackClient

# Ensure a Llama Stack server is running, e.g., locally at http://localhost:8321.
# Authentication typically uses the LLAMA_STACK_CLIENT_API_KEY environment variable.
# Example: export LLAMA_STACK_CLIENT_API_KEY="your_api_key"
# You can also set the base URL via LLAMA_STACK_BASE_URL environment variable.
client = LlamaStackClient(
    base_url=os.environ.get("LLAMA_STACK_BASE_URL", "http://localhost:8321"),
    api_key=os.environ.get("LLAMA_STACK_CLIENT_API_KEY", "dummy_key_for_testing_if_not_set"),
)

try:
    # List available models
    models = client.models.list()
    print("Available models:", [model.id for model in models.data])

    # Perform simple inference using the Responses API
    if models.data:
        response = client.responses.create(
            model=models.data[0].id, # Use the first available model
            input="Write a haiku about coding.",
        )
        print("\nHaiku from Llama Stack:", response.output_text)
    else:
        print("\nNo models found on the Llama Stack server.")

except Exception as e:
    print(f"An error occurred: {e}")
    print("Please ensure your Llama Stack server is running and accessible (e.g., via Docker), and the API key is set correctly.")

view raw JSON →