Inspect AI

0.3.201 verified Tue May 12 auth: no python install: reviewed quickstart: verified

Inspect AI is an open-source framework for large language model (LLM) evaluations, developed by the UK AI Security Institute. It provides robust tools for prompt engineering, integrating tool usage, managing multi-turn dialogues, and conducting model-graded evaluations. The library is actively maintained with frequent releases, often multiple times a month, ensuring up-to-date compatibility and features for evaluating frontier models.

pip install inspect-ai

Common errors

error TypeError: parse_answers() missing 1 required positional argument: 'multiple_correct' ↓

cause This error occurs due to a breaking change in a recent `inspect-ai` update (around commit e4a551f), where the `parse_answers()` function now requires a new `multiple_correct` argument, affecting older evaluation implementations like the Personality eval.

fix

Update your custom scorer or evaluation code to pass the multiple_correct argument to parse_answers(), or consult the inspect-ai changelog for the exact API changes and recommended migration path for your version.

error ModuleNotFoundError: No module named 'inspect_ai' ↓

cause This error indicates that the `inspect-ai` package is not installed in your Python environment, or your Python interpreter cannot find the installed package.

fix

Install the library using pip: pip install inspect-ai. If using a virtual environment, ensure it is activated. In VS Code, verify the correct Python interpreter with inspect-ai installed is selected for your workspace.

error openai.AuthenticationError: Incorrect API key provided ↓

cause This common error occurs when `inspect-ai` attempts to use an OpenAI model, but the `OPENAI_API_KEY` environment variable is either missing, incorrect, or has insufficient permissions. Similar errors can occur with other model providers if their respective API keys are not properly configured.

fix

Ensure your OpenAI API key is correctly set as an environment variable (e.g., export OPENAI_API_KEY=your-api-key) or in a .env file that inspect-ai can load. Check the official documentation for other providers you are using.

error inspect_ai.limit.LimitExceededError ↓

cause This error is raised by `inspect-ai` when an evaluation task or sample exceeds predefined limits, such as maximum messages, tokens, time, or cost, which are often configured to prevent runaway LLM usage.

fix

Adjust the limits for your evaluation using inspect eval CLI options (e.g., --max-messages, --max-tokens, --time-limit, --cost-limit) or programmatically within your task definition, or consider refining your prompt/solver to be more concise.

error AttributeError: module 'inspect' has no attribute 'getargspec' ↓

cause This error typically arises in Python 3.11 and newer versions because the `inspect.getargspec` function has been deprecated and removed. It often indicates that `inspect-ai` or one of its dependencies is using an outdated method to inspect function arguments, or a user might mistakenly try to use the built-in `inspect` module with deprecated functions while expecting `inspect-ai` functionality.

fix

Update inspect-ai and its dependencies to their latest versions, as newer versions are likely to use inspect.signature instead of getargspec. If it's your own code, refactor to use inspect.signature for argument inspection. inspect-evals officially supports Python 3.11 and 3.12, so ensure your environment is up-to-date.

Warnings

gotcha Using the local tool environment (`--sandbox local`) without an 'outer sandbox' is explicitly warned against as it can be a security risk, as tools are executed on the client system. ↓

fix Only use `--sandbox local` when the entire evaluation is already contained within a secure, isolated environment (e.g., Docker, Kubernetes). For sensitive operations, consider more robust sandbox options or carefully review tool definitions.

gotcha Tool usage (e.g., `bash()`, `python()`) is not universally supported by all LLM model providers. Tools are executed on the client machine, not within the model's environment. ↓

fix Consult the Inspect AI documentation for 'Model Providers' to verify which models support tool use before designing evaluations that rely on them. Be aware of the execution context of tools.

gotcha API keys for model providers (e.g., OpenAI, Anthropic, Google) must be correctly configured, typically as environment variables (e.g., `OPENAI_API_KEY`). Evaluations will fail if these are missing or incorrect. ↓

fix Ensure the relevant API key is set as an environment variable before running evaluations. For example: `export OPENAI_API_KEY=your-key`. You may also use `.env` files with `python-dotenv`.

gotcha By default, raw model API request/response logs are only captured and displayed when an error occurs. This can obscure debugging for successful but unexpected model behaviors. ↓

fix To enable comprehensive logging of all model API calls, use the `--log-model-api` command-line option when running `inspect eval`.

breaking Compatibility with external LLM client libraries (e.g., `openai`, `anthropic`, `mistralai`) frequently requires specific minimum versions due to upstream breaking changes in those packages. For example, `openai` v1.104.1 became a minimum required version due to type changes and web search action renames. ↓

fix If encountering errors related to model API calls or types, ensure your `inspect-ai` installation is up-to-date and check the `inspect-ai` changelog for notes on required versions of specific model provider packages. Upgrade those packages as necessary.

breaking Recent versions of `inspect-ai` (0.3.10 and newer) require Python 3.10 or a more recent version. Attempting to install in an older Python environment will result in errors like 'Requires-Python >=3.10' and 'No matching distribution found for inspect-ai'. ↓

fix Ensure your Python environment is version 3.10 or newer before installing `inspect-ai`. For example, use `python:3.10-slim` or a later version in your environment configuration.

Install

pip install inspect-ai openai

Install compatibility reviewed last tested: 2026-05-12

python os / libc status wheel install import disk

3.10 alpine (musl) wheel - 5.99s 282.9M

3.10 alpine (musl) wheel - 5.89s 300.3M

3.10 alpine (musl) - - 5.18s 281.8M

3.10 alpine (musl) - - 5.13s 297.1M

3.10 slim (glibc) wheel 22.4s 4.47s 276M

3.10 slim (glibc) wheel 24.4s 4.69s 293M

3.10 slim (glibc) - - 4.09s 275M

3.10 slim (glibc) - - 4.04s 290M

3.11 alpine (musl) wheel - 6.79s 302.6M

3.11 alpine (musl) wheel - 6.75s 321.0M

3.11 alpine (musl) - - 7.14s 301.5M

3.11 alpine (musl) - - 6.99s 317.6M

3.11 slim (glibc) wheel 19.4s 6.23s 296M

3.11 slim (glibc) wheel 21.1s 6.29s 314M

3.11 slim (glibc) - - 5.55s 295M

3.11 slim (glibc) - - 5.49s 310M

3.12 alpine (musl) wheel - 6.57s 288.4M

3.12 alpine (musl) wheel - 6.47s 306.5M

3.12 alpine (musl) - - 6.65s 287.2M

3.12 alpine (musl) - - 6.40s 303.1M

3.12 slim (glibc) wheel 16.4s 6.42s 284M

3.12 slim (glibc) wheel 17.3s 6.42s 302M

3.12 slim (glibc) - - 6.43s 283M

3.12 slim (glibc) - - 6.39s 299M

3.13 alpine (musl) wheel - 6.05s 288.0M

3.13 alpine (musl) wheel - 6.10s 306.1M

3.13 alpine (musl) - - 6.00s 286.8M

3.13 alpine (musl) - - 6.09s 302.7M

3.13 slim (glibc) wheel 16.6s 6.13s 284M

3.13 slim (glibc) wheel 17.2s 5.97s 301M

3.13 slim (glibc) - - 5.99s 283M

3.13 slim (glibc) - - 6.27s 298M

3.9 alpine (musl) build_error - - - -

3.9 alpine (musl) - - - -

3.9 slim (glibc) build_error - 1.9s - -

3.9 slim (glibc) build_error - 1.8s - -

3.9 slim (glibc) - - - -

Imports

Task
```
from inspect_ai import Task
```
task
```
from inspect_ai import task
```
Sample
```
from inspect_ai.dataset import Sample
```
generate
```
from inspect_ai.solver import generate
```
exact
```
from inspect_ai.scorer import exact
```

Quickstart verified last tested: 2026-04-24

This 'Hello World' example defines a simple evaluation task. It instructs a model to reply with 'Hello World' and uses an exact match scorer to verify the output. To run this, you need to save the code as a Python file (e.g., `hello_eval.py`), ensure the `openai` package is installed, and set your `OPENAI_API_KEY` environment variable. You then execute it via the `inspect eval` command-line interface.

import os
from inspect_ai import Task, task
from inspect_ai.dataset import Sample
from inspect_ai.solver import generate
from inspect_ai.scorer import exact

@task
def hello_world():
    return Task(
        dataset=[
            Sample(input="Just reply with Hello World", target="Hello World"),
        ],
        solver=[generate()],
        scorer=exact(),
    )

# To run this, save it as a Python file (e.g., hello_eval.py)
# and execute from your terminal:
# export OPENAI_API_KEY=your_openai_api_key  # Or set in .env file
# inspect eval hello_eval.py --model openai/gpt-4o

# Example of setting the key for programmatic use (less common for inspect eval CLI)
# os.environ['OPENAI_API_KEY'] = os.environ.get('OPENAI_API_KEY', 'sk-...')