HumanEval Benchmark for Code Generation

library 1.0.3 ·python

✓ verified May 22, 2026

HumanEval is a benchmark developed by OpenAI for assessing the code generation capabilities of Large Language Models (LLMs). It comprises 164 hand-written Python programming problems, each with a function signature, docstring, and comprehensive unit tests, designed to evaluate functional correctness. The library uses the `pass@k` metric for evaluation. The current version is 1.0.3, released on July 24, 2023. As a benchmark dataset and evaluation harness, it has an infrequent release cadence, with updates typically driven by new research or significant improvements to the benchmark itself.

Traffic · last 30 days ↑100% vs prev 7d · indexed Sun Apr 12 · updated Wed May 27

total hits 21

actors 9 distinct systems

last hit 2d ago Bingbot

ByteDance

GPTBot

Script

OAI-SearchBot

ChatGPT-User

Search engines

top countries 🇸🇬 Singapore · 🇺🇸 United States · 🇫🇷 France · 🇩🇪 Germany · 🇨🇦 Canada

Resources

packagepypi.org/project/human-eval/ ↗

API endpoints

full doc /v1/registry/human-eval

install /v1/registry/human-eval/install

compatibility /v1/registry/human-eval/compatibility