conKurrence

JSON →

AI evaluation toolkit — measure inter-rater agreement (Fleiss' κ, Kendall's W) across multiple LLM providers

npx conkurrence