Harbor (Agent Evaluation Framework)
Harbor is an open-source framework, currently at version 0.3.0, designed for evaluating and optimizing AI agents and language models using sandboxed environments. It facilitates the creation and execution of benchmarks, allowing users to assess arbitrary agents and models. The library is actively developed, with its current release focusing on providing tools for robust and scalable agent evaluation. Release cadence is not explicitly stated but updates appear to be regular.
Common errors
-
ModuleNotFoundError: No module named 'harbor'
cause The `harbor` package was not installed or is not accessible in the current Python environment, or a different package with a similar name (like `harborapi`) was installed instead.fixEnsure you have correctly installed the agent evaluation framework: `pip install harbor` or `uv tool install harbor`. -
docker: command not found
cause The Docker command-line client is not installed or not in your system's PATH, which is required for Harbor to provision evaluation environments.fixInstall Docker Desktop (for macOS/Windows) or Docker Engine (for Linux) and ensure it's properly configured and added to your system's PATH. Verify installation with `docker --version`. -
Error: 'harbor' command not found. Ensure Harbor is installed and in your PATH.
cause The `harbor` CLI tool, which is part of the Python package, is not accessible from your shell's PATH, or the installation was incomplete.fixVerify that `pip install harbor` completed successfully. If using a virtual environment, ensure it's activated. Check your system's PATH configuration to include the directory where Python packages install their scripts (e.g., `~/.local/bin` or a virtual environment's `bin/Scripts` directory).
Warnings
- gotcha Harbor (the agent evaluation framework) shares a name with the Harbor Container Registry. Ensure you are installing and using the correct 'harbor' library, which is the agent evaluation framework, not the client libraries for the container registry (e.g., `harborapi`, `harbor-cli`, `harbor-api-client`).
- gotcha Running evaluations with Harbor requires Docker to be installed and running on your system, as it uses sandboxed Docker environments. Lack of a running Docker daemon will prevent evaluations from executing.
- gotcha The primary interaction pattern for Harbor is via its Command Line Interface (CLI). While it's a Python library, many core functionalities, especially running evaluations, are designed to be invoked through `harbor` CLI commands rather than direct Python imports of every component.
Install
-
pip install harbor -
uv tool install harbor
Imports
- Harbor
import harbor
Quickstart
import os
import subprocess
# Note: Harbor is primarily CLI-driven for running evaluations.
# This example demonstrates a basic CLI interaction.
# Ensure Docker is running and 'harbor' is installed.
# Create a dummy task file for evaluation
task_content = "print('Hello from Harbor evaluation!')"
with open('hello_task.py', 'w') as f:
f.write(task_content)
print('Created hello_task.py')
# Run a simple evaluation using the Harbor CLI
# For actual evaluations, you would define an agent and environment.
# This command is a placeholder demonstrating CLI invocation.
# A real quickstart would involve defining a dataset and an agent.
try:
# Example of running a simple command, assuming a 'test' subcommand exists
# or a generic 'run' command without specific agent/dataset is possible.
# The official quickstart uses `harbor run` with datasets/environments.
# This generic call might not be directly runnable without setup.
print('Attempting to run a basic harbor CLI command...')
# As per documentation, a simple quickstart involves `harbor run` on a dataset.
# This is a simplified example. For a full eval, see official docs.
result = subprocess.run(
['harbor', 'run', '--help'], # Or a specific dataset/agent for a real run
capture_output=True, text=True, check=True
)
print("Harbor CLI --help output:\n", result.stdout)
except FileNotFoundError:
print("Error: 'harbor' command not found. Ensure Harbor is installed and in your PATH.")
except subprocess.CalledProcessError as e:
print(f"Error running Harbor CLI: {e}")
print(f"Stdout: {e.stdout}")
print(f"Stderr: {e.stderr}")
finally:
# Clean up dummy task file
if os.path.exists('hello_task.py'):
os.remove('hello_task.py')
print('Cleaned up hello_task.py')