{"id":644,"library":"inspect-ai","title":"Inspect AI","description":"Inspect AI is an open-source framework for large language model (LLM) evaluations, developed by the UK AI Security Institute. It provides robust tools for prompt engineering, integrating tool usage, managing multi-turn dialogues, and conducting model-graded evaluations. The library is actively maintained with frequent releases, often multiple times a month, ensuring up-to-date compatibility and features for evaluating frontier models.","status":"active","version":"0.3.201","language":"python","source_language":"en","source_url":"https://github.com/UKGovernmentBEIS/inspect_ai","tags":["LLM","evaluation","AI","framework"],"install":[{"cmd":"pip install inspect-ai","lang":"bash","label":"Install core library"},{"cmd":"pip install inspect-ai openai","lang":"bash","label":"Install with OpenAI support"}],"dependencies":[{"reason":"Minimum Python version required.","package":"python>=3.10","optional":false},{"reason":"Used for data validation and settings management.","package":"pydantic","optional":false},{"reason":"Asynchronous HTTP client for API interactions.","package":"httpx","optional":false},{"reason":"Asynchronous I/O backend.","package":"anyio","optional":false},{"reason":"Commonly used for loading environment variables like API keys.","package":"python-dotenv","optional":true},{"reason":"Required for evaluating OpenAI models.","package":"openai","optional":true},{"reason":"Required for evaluating Anthropic models.","package":"anthropic","optional":true},{"reason":"Required for evaluating Google Gemini models.","package":"google-genai","optional":true}],"imports":[{"symbol":"Task","correct":"from inspect_ai import Task"},{"symbol":"task","correct":"from inspect_ai import task"},{"symbol":"Sample","correct":"from inspect_ai.dataset import Sample"},{"symbol":"generate","correct":"from inspect_ai.solver import generate"},{"symbol":"exact","correct":"from inspect_ai.scorer import exact"}],"quickstart":{"code":"import os\nfrom inspect_ai import Task, task\nfrom inspect_ai.dataset import Sample\nfrom inspect_ai.solver import generate\nfrom inspect_ai.scorer import exact\n\n@task\ndef hello_world():\n    return Task(\n        dataset=[\n            Sample(input=\"Just reply with Hello World\", target=\"Hello World\"),\n        ],\n        solver=[generate()],\n        scorer=exact(),\n    )\n\n# To run this, save it as a Python file (e.g., hello_eval.py)\n# and execute from your terminal:\n# export OPENAI_API_KEY=your_openai_api_key  # Or set in .env file\n# inspect eval hello_eval.py --model openai/gpt-4o\n\n# Example of setting the key for programmatic use (less common for inspect eval CLI)\n# os.environ['OPENAI_API_KEY'] = os.environ.get('OPENAI_API_KEY', 'sk-...') \n","lang":"python","description":"This 'Hello World' example defines a simple evaluation task. It instructs a model to reply with 'Hello World' and uses an exact match scorer to verify the output. To run this, you need to save the code as a Python file (e.g., `hello_eval.py`), ensure the `openai` package is installed, and set your `OPENAI_API_KEY` environment variable. You then execute it via the `inspect eval` command-line interface."},"warnings":[{"fix":"Only use `--sandbox local` when the entire evaluation is already contained within a secure, isolated environment (e.g., Docker, Kubernetes). For sensitive operations, consider more robust sandbox options or carefully review tool definitions.","message":"Using the local tool environment (`--sandbox local`) without an 'outer sandbox' is explicitly warned against as it can be a security risk, as tools are executed on the client system.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Consult the Inspect AI documentation for 'Model Providers' to verify which models support tool use before designing evaluations that rely on them. Be aware of the execution context of tools.","message":"Tool usage (e.g., `bash()`, `python()`) is not universally supported by all LLM model providers. Tools are executed on the client machine, not within the model's environment.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure the relevant API key is set as an environment variable before running evaluations. For example: `export OPENAI_API_KEY=your-key`. You may also use `.env` files with `python-dotenv`.","message":"API keys for model providers (e.g., OpenAI, Anthropic, Google) must be correctly configured, typically as environment variables (e.g., `OPENAI_API_KEY`). Evaluations will fail if these are missing or incorrect.","severity":"gotcha","affected_versions":"All versions"},{"fix":"To enable comprehensive logging of all model API calls, use the `--log-model-api` command-line option when running `inspect eval`.","message":"By default, raw model API request/response logs are only captured and displayed when an error occurs. This can obscure debugging for successful but unexpected model behaviors.","severity":"gotcha","affected_versions":"All versions before 0.3.184 (default changed), users may still revert to old behavior."},{"fix":"If encountering errors related to model API calls or types, ensure your `inspect-ai` installation is up-to-date and check the `inspect-ai` changelog for notes on required versions of specific model provider packages. Upgrade those packages as necessary.","message":"Compatibility with external LLM client libraries (e.g., `openai`, `anthropic`, `mistralai`) frequently requires specific minimum versions due to upstream breaking changes in those packages. For example, `openai` v1.104.1 became a minimum required version due to type changes and web search action renames.","severity":"breaking","affected_versions":"Various, depending on the specific external library and its updates. Observed for `openai` around 0.3.127-0.3.128 and `mistralai` around 0.3.191."},{"fix":"Ensure your Python environment is version 3.10 or newer before installing `inspect-ai`. For example, use `python:3.10-slim` or a later version in your environment configuration.","message":"Recent versions of `inspect-ai` (0.3.10 and newer) require Python 3.10 or a more recent version. Attempting to install in an older Python environment will result in errors like 'Requires-Python >=3.10' and 'No matching distribution found for inspect-ai'.","severity":"breaking","affected_versions":"0.3.10+"}],"env_vars":null,"last_verified":"2026-05-12T17:12:28.405Z","next_check":"2026-06-26T00:00:00.000Z","problems":[{"fix":"Update your custom scorer or evaluation code to pass the `multiple_correct` argument to `parse_answers()`, or consult the `inspect-ai` changelog for the exact API changes and recommended migration path for your version.","cause":"This error occurs due to a breaking change in a recent `inspect-ai` update (around commit e4a551f), where the `parse_answers()` function now requires a new `multiple_correct` argument, affecting older evaluation implementations like the Personality eval.","error":"TypeError: parse_answers() missing 1 required positional argument: 'multiple_correct'"},{"fix":"Install the library using pip: `pip install inspect-ai`. If using a virtual environment, ensure it is activated. In VS Code, verify the correct Python interpreter with `inspect-ai` installed is selected for your workspace.","cause":"This error indicates that the `inspect-ai` package is not installed in your Python environment, or your Python interpreter cannot find the installed package.","error":"ModuleNotFoundError: No module named 'inspect_ai'"},{"fix":"Ensure your OpenAI API key is correctly set as an environment variable (e.g., `export OPENAI_API_KEY=your-api-key`) or in a `.env` file that `inspect-ai` can load. Check the official documentation for other providers you are using.","cause":"This common error occurs when `inspect-ai` attempts to use an OpenAI model, but the `OPENAI_API_KEY` environment variable is either missing, incorrect, or has insufficient permissions. Similar errors can occur with other model providers if their respective API keys are not properly configured.","error":"openai.AuthenticationError: Incorrect API key provided"},{"fix":"Adjust the limits for your evaluation using `inspect eval` CLI options (e.g., `--max-messages`, `--max-tokens`, `--time-limit`, `--cost-limit`) or programmatically within your task definition, or consider refining your prompt/solver to be more concise.","cause":"This error is raised by `inspect-ai` when an evaluation task or sample exceeds predefined limits, such as maximum messages, tokens, time, or cost, which are often configured to prevent runaway LLM usage.","error":"inspect_ai.limit.LimitExceededError"},{"fix":"Update `inspect-ai` and its dependencies to their latest versions, as newer versions are likely to use `inspect.signature` instead of `getargspec`. If it's your own code, refactor to use `inspect.signature` for argument inspection. `inspect-evals` officially supports Python 3.11 and 3.12, so ensure your environment is up-to-date.","cause":"This error typically arises in Python 3.11 and newer versions because the `inspect.getargspec` function has been deprecated and removed. It often indicates that `inspect-ai` or one of its dependencies is using an outdated method to inspect function arguments, or a user might mistakenly try to use the built-in `inspect` module with deprecated functions while expecting `inspect-ai` functionality.","error":"AttributeError: module 'inspect' has no attribute 'getargspec'"}],"ecosystem":"pypi","meta_description":null,"install_score":75,"install_tag":"reviewed","quickstart_score":70,"quickstart_tag":"verified","pypi_latest":"0.3.220","install_checks":{"last_tested":"2026-05-12","tag":"reviewed","tag_description":"minor failures on some runtimes or slightly older test data","results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":null,"import_time_s":5.99,"mem_mb":63.6,"disk_size":"282.9M"},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":null,"import_time_s":5.89,"mem_mb":63.6,"disk_size":"300.3M"},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":5.18,"mem_mb":63.1,"disk_size":"281.8M"},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":5.13,"mem_mb":63.1,"disk_size":"297.1M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":22.4,"import_time_s":4.47,"mem_mb":63.6,"disk_size":"276M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":24.4,"import_time_s":4.69,"mem_mb":63.6,"disk_size":"293M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":4.09,"mem_mb":63.1,"disk_size":"275M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":4.04,"mem_mb":63.1,"disk_size":"290M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":null,"import_time_s":6.79,"mem_mb":69.3,"disk_size":"302.6M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":null,"import_time_s":6.75,"mem_mb":69.3,"disk_size":"321.0M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":7.14,"mem_mb":68.7,"disk_size":"301.5M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":6.99,"mem_mb":68.7,"disk_size":"317.6M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":19.4,"import_time_s":6.23,"mem_mb":69.3,"disk_size":"296M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":21.1,"import_time_s":6.29,"mem_mb":69.3,"disk_size":"314M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":5.55,"mem_mb":68.7,"disk_size":"295M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":5.49,"mem_mb":68.7,"disk_size":"310M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":null,"import_time_s":6.57,"mem_mb":67.8,"disk_size":"288.4M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":null,"import_time_s":6.47,"mem_mb":67.8,"disk_size":"306.5M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":6.65,"mem_mb":67.2,"disk_size":"287.2M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":6.4,"mem_mb":67.2,"disk_size":"303.1M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":16.4,"import_time_s":6.42,"mem_mb":67.8,"disk_size":"284M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":17.3,"import_time_s":6.42,"mem_mb":67.8,"disk_size":"302M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":6.43,"mem_mb":67.2,"disk_size":"283M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":6.39,"mem_mb":67.2,"disk_size":"299M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":null,"import_time_s":6.05,"mem_mb":68.7,"disk_size":"288.0M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":null,"import_time_s":6.1,"mem_mb":68.7,"disk_size":"306.1M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":6,"mem_mb":68.1,"disk_size":"286.8M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":6.09,"mem_mb":68.1,"disk_size":"302.7M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":16.6,"import_time_s":6.13,"mem_mb":68.7,"disk_size":"284M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":0,"wheel_type":"wheel","failure_reason":null,"install_time_s":17.2,"import_time_s":5.97,"mem_mb":68.7,"disk_size":"301M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":5.99,"mem_mb":68.1,"disk_size":"283M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":6.27,"mem_mb":68.1,"disk_size":"298M"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":" $EXIT -eq 0 ","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":1.9,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":" $EXIT -eq 0 ","exit_code":1,"wheel_type":null,"failure_reason":"build_error","install_time_s":1.8,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null}]},"quickstart_checks":{"last_tested":"2026-04-24","tag":"verified","tag_description":"quickstart runs on critical runtimes, recently tested","results":[{"runtime":"python:3.10-alpine","exit_code":0},{"runtime":"python:3.10-slim","exit_code":0},{"runtime":"python:3.11-alpine","exit_code":0},{"runtime":"python:3.11-slim","exit_code":0},{"runtime":"python:3.12-alpine","exit_code":0},{"runtime":"python:3.12-slim","exit_code":0},{"runtime":"python:3.13-alpine","exit_code":0},{"runtime":"python:3.13-slim","exit_code":0},{"runtime":"python:3.9-alpine","exit_code":1},{"runtime":"python:3.9-slim","exit_code":1}]}}