Azure AI Evaluation SDK for Python

1.16.5 · active · verified Wed Apr 15

The Azure AI Evaluation SDK for Python provides tools to quantitatively measure the performance of generative AI applications. It offers built-in and custom evaluators for mathematical, AI-assisted quality, and safety metrics, enabling comprehensive insights into application capabilities and limitations. This library is actively developed, with recent releases focusing on bug fixes and new features, maintaining a regular release cadence as part of the broader Azure SDK for Python.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize a `RelevanceEvaluator` with Azure OpenAI model configuration using environment variables. It outlines how to prepare data for evaluation and mentions the `evaluate` function for batch processing, with optional integration for logging results to an Azure AI Project. Ensure your Azure OpenAI endpoint, API key, and deployment name are set as environment variables.

import os
from azure.ai.evaluation import evaluate, RelevanceEvaluator

# Ensure environment variables are set for Azure OpenAI
# AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_KEY, AZURE_OPENAI_DEPLOYMENT

model_config = {
    "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT", ""),
    "api_key": os.environ.get("AZURE_OPENAI_KEY", ""),
    "azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT", ""),
}

# Example for a simple AI-assisted quality evaluation
relevance_evaluator = RelevanceEvaluator(model_config=model_config)

# For a conversation/turn based evaluation
# result = relevance_evaluator(
#     query="What is the capital of Japan?",
#     response="Tokyo is the capital of Japan."
# )

# For evaluating a dataset
data_for_evaluation = [
    {"id": "1", "query": "What is the capital of France?", "response": "Paris.", "context": "France is a country in Europe. Its capital is Paris."},
    {"id": "2", "query": "Who painted the Mona Lisa?", "response": "Leonardo da Vinci.", "context": "Leonardo da Vinci was an Italian polymath."}
]

# You can use `evaluate` function for batch evaluation on a dataset
# Ensure you have a configured Azure AI Project if logging results to AI Studio
# azure_ai_project = {
#     "subscription_id": os.environ.get("AZURE_SUBSCRIPTION_ID", ""),
#     "resource_group_name": os.environ.get("AZURE_RESOURCE_GROUP", ""),
#     "project_name": os.environ.get("AZURE_AI_PROJECT_NAME", ""),
# }

# results = evaluate(
#     data=data_for_evaluation,
#     evaluators=[relevance_evaluator],
#     # azure_ai_project=azure_ai_project # Uncomment to log to AI Studio
# )

print("Evaluators initialized. Ready for evaluation.")

view raw JSON →