LM Evaluation Harness

JSON →
library 0.4.11 ·python
verified May 22, 2026

LM Evaluation Harness (lm-eval) is a comprehensive framework for evaluating language models on a wide range of benchmarks and tasks. It supports various model backends (HuggingFace, vLLM, SGLang, etc.) and provides a standardized way to compare model performance. The current version is 0.4.11, and it maintains a rapid release cadence with frequent minor updates and occasional breaking changes.

total hits 21
actors 7 distinct systems
last hit 14h ago ByteDance
ByteDance
9
Script
3
GPTBot
2
ChatGPT-User
1
Search engines
1
Humans
1

top countries 🇸🇬 Singapore · 🇺🇸 United States · 🇫🇷 France · 🇩🇪 Germany · 🇮🇳 India