Inspect AI

0.3.201 · active · verified Sat Mar 28

Inspect AI is an open-source framework for large language model (LLM) evaluations, developed by the UK AI Security Institute. It provides robust tools for prompt engineering, integrating tool usage, managing multi-turn dialogues, and conducting model-graded evaluations. The library is actively maintained with frequent releases, often multiple times a month, ensuring up-to-date compatibility and features for evaluating frontier models.

Warnings

Install

Imports

Quickstart

This 'Hello World' example defines a simple evaluation task. It instructs a model to reply with 'Hello World' and uses an exact match scorer to verify the output. To run this, you need to save the code as a Python file (e.g., `hello_eval.py`), ensure the `openai` package is installed, and set your `OPENAI_API_KEY` environment variable. You then execute it via the `inspect eval` command-line interface.

import os
from inspect_ai import Task, task
from inspect_ai.dataset import Sample
from inspect_ai.solver import generate
from inspect_ai.scorer import exact

@task
def hello_world():
    return Task(
        dataset=[
            Sample(input="Just reply with Hello World", target="Hello World"),
        ],
        solver=[generate()],
        scorer=exact(),
    )

# To run this, save it as a Python file (e.g., hello_eval.py)
# and execute from your terminal:
# export OPENAI_API_KEY=your_openai_api_key  # Or set in .env file
# inspect eval hello_eval.py --model openai/gpt-4o

# Example of setting the key for programmatic use (less common for inspect eval CLI)
# os.environ['OPENAI_API_KEY'] = os.environ.get('OPENAI_API_KEY', 'sk-...') 

view raw JSON →