HumanEval Benchmark for Code Generation

JSON →
library 1.0.3 ·python
verified May 22, 2026

HumanEval is a benchmark developed by OpenAI for assessing the code generation capabilities of Large Language Models (LLMs). It comprises 164 hand-written Python programming problems, each with a function signature, docstring, and comprehensive unit tests, designed to evaluate functional correctness. The library uses the `pass@k` metric for evaluation. The current version is 1.0.3, released on July 24, 2023. As a benchmark dataset and evaluation harness, it has an infrequent release cadence, with updates typically driven by new research or significant improvements to the benchmark itself.

total hits 21
actors 9 distinct systems
last hit 2d ago Bingbot
ByteDance
8
GPTBot
2
Script
2
OAI-SearchBot
2
ChatGPT-User
1
Search engines
2

top countries 🇸🇬 Singapore · 🇺🇸 United States · 🇫🇷 France · 🇩🇪 Germany · 🇨🇦 Canada