Iris

official stdio

MCP-native agent evaluation and observability server with trace logging, output quality evaluation, cost tracking, 12 built-in eval rules, real-time dashboard, and PII detection

Install

npx @iris-eval/mcp-server

Tools · 9

log_trace Log an agent execution with spans, tool calls, token usage, and cost
evaluate_output Score output quality against completeness, relevance, safety, and cost rules (heuristic, deterministic, free)
get_traces Query stored traces with filtering, pagination, and time-range support
list_rules Enumerate deployed custom eval rules (read-only)
deploy_rule Register a new custom eval rule so it fires on every evaluate_output of that category
delete_rule Remove a deployed custom rule (destructive, idempotent)
delete_trace Remove a single stored trace by ID (destructive, tenant-scoped)
evaluate_with_llm_judge Semantic eval via LLM (Anthropic or OpenAI). Five templates: accuracy, helpfulness, safety, correctness, faithfulness. Cost-capped, per-eval pricing disclosed. Bring your own API key.
verify_citations Extract citations from output, fetch sources behind an SSRF-guarded + domain-allowlisted resolver, and use an LLM judge to check whether each source actually supports the cited claim. Opt-in outbound HTTP. Same BYOK requirement as evaluate_with_llm_judge.

Environment variables

IRIS_ANTHROPIC_API_KEY
IRIS_OPENAI_API_KEY
IRIS_PORT
IRIS_HOST
IRIS_DASHBOARD_PORT
IRIS_API_KEY

Links

githubgithub.com/iris-eval/mcp-server ↗

★ 6 GitHub stars