HumanEval Benchmark for Code Generation
JSON →HumanEval is a benchmark developed by OpenAI for assessing the code generation capabilities of Large Language Models (LLMs). It comprises 164 hand-written Python programming problems, each with a function signature, docstring, and comprehensive unit tests, designed to evaluate functional correctness. The library uses the `pass@k` metric for evaluation. The current version is 1.0.3, released on July 24, 2023. As a benchmark dataset and evaluation harness, it has an infrequent release cadence, with updates typically driven by new research or significant improvements to the benchmark itself.
Traffic · last 30 days ↑100% vs prev 7d
total hits 21
actors 9 distinct systems
last hit 2d ago Bingbot
top countries 🇸🇬 Singapore · 🇺🇸 United States · 🇫🇷 France · 🇩🇪 Germany · 🇨🇦 Canada
Resources
API endpoints
full doc /v1/registry/human-eval
install /v1/registry/human-eval/install
compatibility /v1/registry/human-eval/compatibility