Open-source Evaluators for LLM Agents
JSON →Agentevals is an open-source Python library from Microsoft designed to help developers effectively evaluate the performance of Large Language Model (LLM) agents. It provides a framework for defining custom agents, various types of evaluators (e.g., code execution, human feedback), and structured scenarios for consistent testing. The library is currently in early development (v0.0.9) and is expected to have regular updates with evolving features and APIs.
Traffic · last 30 days ↑30% vs prev 7d
total hits 31
actors 7 distinct systems
last hit 1d ago AhrefsBot
top countries 🇸🇬 Singapore · 🇺🇸 United States · 🇩🇪 Germany · 🇨🇦 Canada · VN
Resources
API endpoints
full doc /v1/registry/agentevals
install /v1/registry/agentevals/install
compatibility /v1/registry/agentevals/compatibility