SWE-bench

JSON →
library 4.1.0 ·python
verified May 20, 2026

The official SWE-bench package (current version 4.1.0) provides a benchmark for evaluating large language models (LLMs) on software engineering tasks. It focuses on automatically testing model-generated code fixes against real-world software bugs and is actively developed with frequent updates, often involving significant changes between major versions.

total hits 12
actors 4 distinct systems
last hit 1d ago GPTBot
GPTBot
6
Script
3
Search engines
1

top countries 🇺🇸 United States · 🇩🇪 Germany · 🇮🇳 India · 🇫🇷 France · 🇨🇦 Canada