Open-source Evaluators for LLM Agents

library 0.0.9 ·python

✓ verified May 25, 2026

Agentevals is an open-source Python library from Microsoft designed to help developers effectively evaluate the performance of Large Language Model (LLM) agents. It provides a framework for defining custom agents, various types of evaluators (e.g., code execution, human feedback), and structured scenarios for consistent testing. The library is currently in early development (v0.0.9) and is expected to have regular updates with evolving features and APIs.

Traffic · last 30 days ↑30% vs prev 7d · indexed Thu Apr 16 · updated Mon Jun 01

total hits 31

actors 7 distinct systems

last hit 1d ago AhrefsBot

MetaBot

GPTBot

Script

Amazonbot

Search engines

Humans

top countries 🇸🇬 Singapore · 🇺🇸 United States · 🇩🇪 Germany · 🇨🇦 Canada · VN

Resources

packagepypi.org/project/agentevals/ ↗

API endpoints

full doc /v1/registry/agentevals

install /v1/registry/agentevals/install

compatibility /v1/registry/agentevals/compatibility