HUD Python SDK (Evaluations and RL Environments)

0.5.35 · active · verified Thu Apr 16

HUD is a platform for building Reinforcement Learning (RL) environments for AI agents. It allows users to define agent-callable tools, write evaluation scenarios, run evaluations at scale, and train models on the results. The Python SDK, currently at version 0.5.35, is actively developed with frequent releases, often multiple times a month, providing tools for creating and evaluating AI agents.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates defining a HUD environment with a custom tool and a scenario, then running an AI agent against that scenario using `hud.eval()`. Ensure your `HUD_API_KEY` environment variable is set for authentication.

import os
from hud import Environment
from hud.agents import create_agent

# Ensure your HUD_API_KEY is set as an environment variable
# Example: export HUD_API_KEY="your_api_key_here"
hud_api_key = os.environ.get('HUD_API_KEY', '')

# Define an environment
env = Environment("my-first-env")

# Define a scenario using a tool
@env.tool()
def add(a: int, b: int) -> int:
    """Adds two numbers."""
    return a + b

@env.scenario("sum-check")
async def sum_check(num1: int, num2: int):
    # Prompt the agent to use the 'add' tool
    answer = yield f"What is the sum of {num1} and {num2}? Use the 'add' tool."
    correct = num1 + num2
    # Score the agent's answer
    yield 1.0 if str(correct) in str(answer) else 0.0

async def run_evaluation():
    # Create a task for the scenario
    task = env("sum-check", num1=5, num2=7)

    # Create an agent (e.g., using a model via HUD's gateway)
    agent = create_agent("gpt-4o") # or "claude-sonnet-4-5", etc.

    print(f"Running task: {task.scenario_slug} with {agent.model}")
    async with hud.eval(task) as ctx:
        result = await agent.run(ctx)
        print(f"Agent response: {result.response}")
        print(f"Reward: {result.reward}")

if __name__ == "__main__":
    import asyncio
    if not hud_api_key:
        print("Error: HUD_API_KEY environment variable not set. Please set it to run the quickstart.")
    else:
        asyncio.run(run_evaluation())

view raw JSON →