Open-source Evaluators for LLM Agents

JSON →
library 0.0.9 ·python
verified May 25, 2026

Agentevals is an open-source Python library from Microsoft designed to help developers effectively evaluate the performance of Large Language Model (LLM) agents. It provides a framework for defining custom agents, various types of evaluators (e.g., code execution, human feedback), and structured scenarios for consistent testing. The library is currently in early development (v0.0.9) and is expected to have regular updates with evolving features and APIs.

total hits 31
actors 7 distinct systems
last hit 1d ago AhrefsBot
MetaBot
4
GPTBot
2
Script
2
Amazonbot
2
Search engines
9
Humans
2

top countries 🇸🇬 Singapore · 🇺🇸 United States · 🇩🇪 Germany · 🇨🇦 Canada · VN