Harbor (Agent Evaluation Framework)

0.3.0 · active · verified Thu Apr 16

Harbor is an open-source framework, currently at version 0.3.0, designed for evaluating and optimizing AI agents and language models using sandboxed environments. It facilitates the creation and execution of benchmarks, allowing users to assess arbitrary agents and models. The library is actively developed, with its current release focusing on providing tools for robust and scalable agent evaluation. Release cadence is not explicitly stated but updates appear to be regular.

Common errors

Warnings

Install

Imports

Quickstart

Harbor is primarily designed for command-line interaction to manage and execute agent evaluations. The quickstart typically involves using the `harbor` CLI tool to run evaluations against defined datasets and agents within Dockerized environments. This Python snippet demonstrates how you might invoke a basic `harbor` CLI command programmatically. For a full evaluation, users would define tasks, agents, and environments as described in the official Harbor documentation.

import os
import subprocess

# Note: Harbor is primarily CLI-driven for running evaluations.
# This example demonstrates a basic CLI interaction.
# Ensure Docker is running and 'harbor' is installed.

# Create a dummy task file for evaluation
task_content = "print('Hello from Harbor evaluation!')"
with open('hello_task.py', 'w') as f:
    f.write(task_content)

print('Created hello_task.py')

# Run a simple evaluation using the Harbor CLI
# For actual evaluations, you would define an agent and environment.
# This command is a placeholder demonstrating CLI invocation.
# A real quickstart would involve defining a dataset and an agent.

try:
    # Example of running a simple command, assuming a 'test' subcommand exists
    # or a generic 'run' command without specific agent/dataset is possible.
    # The official quickstart uses `harbor run` with datasets/environments.
    # This generic call might not be directly runnable without setup.
    print('Attempting to run a basic harbor CLI command...')
    # As per documentation, a simple quickstart involves `harbor run` on a dataset.
    # This is a simplified example. For a full eval, see official docs.
    result = subprocess.run(
        ['harbor', 'run', '--help'], # Or a specific dataset/agent for a real run
        capture_output=True, text=True, check=True
    )
    print("Harbor CLI --help output:\n", result.stdout)
except FileNotFoundError:
    print("Error: 'harbor' command not found. Ensure Harbor is installed and in your PATH.")
except subprocess.CalledProcessError as e:
    print(f"Error running Harbor CLI: {e}")
    print(f"Stdout: {e.stdout}")
    print(f"Stderr: {e.stderr}")
finally:
    # Clean up dummy task file
    if os.path.exists('hello_task.py'):
        os.remove('hello_task.py')
        print('Cleaned up hello_task.py')

view raw JSON →