Iris
JSON →MCP-native agent evaluation and observability server with trace logging, output quality evaluation, cost tracking, 12 built-in eval rules, real-time dashboard, and PII detection
Install
npx @iris-eval/mcp-server Tools · 9
- log_trace Log an agent execution with spans, tool calls, token usage, and cost
- evaluate_output Score output quality against completeness, relevance, safety, and cost rules (heuristic, deterministic, free)
- get_traces Query stored traces with filtering, pagination, and time-range support
- list_rules Enumerate deployed custom eval rules (read-only)
- deploy_rule Register a new custom eval rule so it fires on every evaluate_output of that category
- delete_rule Remove a deployed custom rule (destructive, idempotent)
- delete_trace Remove a single stored trace by ID (destructive, tenant-scoped)
- evaluate_with_llm_judge Semantic eval via LLM (Anthropic or OpenAI). Five templates: accuracy, helpfulness, safety, correctness, faithfulness. Cost-capped, per-eval pricing disclosed. Bring your own API key.
- verify_citations Extract citations from output, fetch sources behind an SSRF-guarded + domain-allowlisted resolver, and use an LLM judge to check whether each source actually supports the cited claim. Opt-in outbound HTTP. Same BYOK requirement as evaluate_with_llm_judge.
Environment variables
IRIS_ANTHROPIC_API_KEYIRIS_OPENAI_API_KEYIRIS_PORTIRIS_HOSTIRIS_DASHBOARD_PORTIRIS_API_KEY
Links
★ 6 GitHub stars