SWE-bench
JSON →The official SWE-bench package (current version 4.1.0) provides a benchmark for evaluating large language models (LLMs) on software engineering tasks. It focuses on automatically testing model-generated code fixes against real-world software bugs and is actively developed with frequent updates, often involving significant changes between major versions.
Traffic · last 30 days ↑300% vs prev 7d
total hits 12
actors 4 distinct systems
last hit 1d ago GPTBot
top countries 🇺🇸 United States · 🇩🇪 Germany · 🇮🇳 India · 🇫🇷 France · 🇨🇦 Canada
Resources
packagepypi.org/project/swebench/ ↗
API endpoints
full doc /v1/registry/swebench
install /v1/registry/swebench/install
compatibility /v1/registry/swebench/compatibility