{"id":1737,"library":"swebench","title":"SWE-bench","description":"The official SWE-bench package (current version 4.1.0) provides a benchmark for evaluating large language models (LLMs) on software engineering tasks. It focuses on automatically testing model-generated code fixes against real-world software bugs and is actively developed with frequent updates, often involving significant changes between major versions.","status":"active","version":"4.1.0","language":"en","source_language":"en","source_url":"https://github.com/swe-bench/swe-bench","tags":["LLM","benchmark","software-engineering","evaluation","AI","NLP"],"install":[{"cmd":"pip install swebench","lang":"bash","label":"Install core library"}],"dependencies":[{"reason":"Required for setting up task-specific environments, especially for running evaluations on diverse software projects.","package":"conda","optional":false},{"reason":"Required for executing task evaluations in isolated and consistent environments.","package":"docker","optional":false}],"imports":[{"symbol":"get_tasks","correct":"from swebench import get_tasks"},{"symbol":"SWEBenchRunner","correct":"from swebench.harness.runner import SWEBenchRunner"},{"symbol":"ModelEngine","correct":"from swebench.harness.engine_wrappers import ModelEngine"}],"quickstart":{"code":"import os\nfrom swebench import get_tasks\n\n# --- Quickstart: Accessing SWE-bench data ---\n# Note: SWE-bench data must be downloaded separately using the CLI:\n# `swebench download`\n# This command typically creates a 'data' directory in your current working directory.\n# Adjust data_path if your data is located elsewhere (e.g., specific split like lite).\ndata_path = os.path.join(os.getcwd(), 'data', 'default_swebench_tasks.json')\n# You might also want to use 'lite_swebench_tasks.json' for the smaller lite split.\n\ntasks = []\ntry:\n    # Attempt to load tasks from the specified path\n    tasks = get_tasks(data_path=data_path)\n    print(f\"Successfully loaded {len(tasks)} tasks from {data_path}\")\n    if tasks:\n        print(\"\\nExample task structure (first task):\")\n        # Print a subset of a task's keys for brevity\n        first_task = tasks[0]\n        for key in ['repo', 'pull_request', 'instance_id', 'problem_statement', 'base_commit']:\n            if key in first_task:\n                print(f\"  {key}: {first_task[key][:100]}{'...' if len(first_task[key]) > 100 else ''}\")\nexcept FileNotFoundError:\n    print(f\"Error: Data file not found at {data_path}.\")\n    print(\"Please ensure you have run `swebench download` in your terminal.\")\n    print(\"Or specify the correct path to your downloaded SWE-bench JSON data.\")\nexcept Exception as e:\n    print(f\"An unexpected error occurred: {e}\")\n\n# --- Further steps (beyond this quickstart): ---\n# For running a full evaluation, you would typically initialize a `SWEBenchRunner`\n# and integrate a `ModelEngine` to test your LLM's code generation.\n# This process heavily relies on pre-installed 'conda' and 'docker' for\n# environment creation and isolated task execution.","lang":"python","description":"This quickstart demonstrates how to programmatically load SWE-bench tasks after downloading the dataset using the `swebench download` CLI command. It prints basic information about the loaded tasks or guides the user if the data isn't found. Full evaluation with `SWEBenchRunner` and `ModelEngine` requires `conda` and `docker`."},"warnings":[{"fix":"Consult the official GitHub repository's release notes for v4.0.0 and updated documentation on setting up and running Docker-based evaluations.","message":"SWE-bench v4.0.0 introduced significant breaking changes related to how Docker environments are specified and managed. If upgrading from earlier versions (e.g., v3.x), review the new Docker integration patterns.","severity":"breaking","affected_versions":">=4.0.0"},{"fix":"Refer to the v3.0.0 release notes and updated examples on environment specification. You may need to update your task data or evaluation scripts to align with the new structure.","message":"SWE-bench v3.0.0 included a major refactor with breaking changes to how environments are specified and built for task evaluation. Code relying on older environment configuration schemas will likely fail.","severity":"breaking","affected_versions":">=3.0.0"},{"fix":"Ensure `conda` (or miniconda/anaconda) and `docker` are installed and working before attempting to run `SWEBenchRunner` evaluations. Consult their respective installation guides.","message":"While `pip install swebench` installs the core library, running actual task evaluations (which involves building and testing code environments) strictly requires `conda` and `docker` to be pre-installed and properly configured on your system.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Run `swebench download` in your terminal to fetch the dataset. By default, it creates a 'data' directory in your current working directory. Always check the path when calling `get_tasks`.","message":"The SWE-bench benchmark dataset itself is not included with the `pip` package. It must be separately downloaded using the `swebench download` CLI command before you can programmatically access tasks using `get_tasks`.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}