LM Evaluation Harness

library 0.4.11 ·python

✓ verified May 22, 2026

LM Evaluation Harness (lm-eval) is a comprehensive framework for evaluating language models on a wide range of benchmarks and tasks. It supports various model backends (HuggingFace, vLLM, SGLang, etc.) and provides a standardized way to compare model performance. The current version is 0.4.11, and it maintains a rapid release cadence with frequent minor updates and occasional breaking changes.

Traffic · last 30 days ↑57% vs prev 7d · indexed Sun Apr 12 · updated Wed May 27

total hits 21

actors 7 distinct systems

last hit 14h ago ByteDance

ByteDance

Script

GPTBot

ChatGPT-User

Search engines

Humans

top countries 🇸🇬 Singapore · 🇺🇸 United States · 🇫🇷 France · 🇩🇪 Germany · 🇮🇳 India

Resources

githubgithub.com/EleutherAI/lm-evaluation-harness ↗

packagepypi.org/project/lm-eval/ ↗

API endpoints

full doc /v1/registry/lm-eval

install /v1/registry/lm-eval/install

compatibility /v1/registry/lm-eval/compatibility