{"id":6531,"library":"azure-ai-evaluation","title":"Azure AI Evaluation SDK for Python","description":"The Azure AI Evaluation SDK for Python provides tools to quantitatively measure the performance of generative AI applications. It offers built-in and custom evaluators for mathematical, AI-assisted quality, and safety metrics, enabling comprehensive insights into application capabilities and limitations. This library is actively developed, with recent releases focusing on bug fixes and new features, maintaining a regular release cadence as part of the broader Azure SDK for Python.","status":"active","version":"1.16.5","language":"en","source_language":"en","source_url":"https://github.com/Azure/azure-sdk-for-python","tags":["Azure","AI","Evaluation","Generative AI","LLM","Quality","Safety"],"install":[{"cmd":"pip install azure-ai-evaluation","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"Commonly used for managing environment variables in local development for Azure OpenAI credentials.","package":"python-dotenv","optional":true},{"reason":"Required for using AI-assisted evaluators and for tracking results in the Azure AI Studio.","package":"Azure AI Foundry Project or Azure OpenAI","optional":true}],"imports":[{"note":"Main function to run evaluations.","symbol":"evaluate","correct":"from azure.ai.evaluation import evaluate"},{"note":"Example of a built-in AI-assisted quality evaluator.","symbol":"RelevanceEvaluator","correct":"from azure.ai.evaluation import RelevanceEvaluator"},{"note":"Example of a built-in NLP metric evaluator.","symbol":"BleuScoreEvaluator","correct":"from azure.ai.evaluation import BleuScoreEvaluator"},{"note":"Example of a built-in risk and safety evaluator.","symbol":"ViolenceEvaluator","correct":"from azure.ai.evaluation import ViolenceEvaluator"}],"quickstart":{"code":"import os\nfrom azure.ai.evaluation import evaluate, RelevanceEvaluator\n\n# Ensure environment variables are set for Azure OpenAI\n# AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_KEY, AZURE_OPENAI_DEPLOYMENT\n\nmodel_config = {\n    \"azure_endpoint\": os.environ.get(\"AZURE_OPENAI_ENDPOINT\", \"\"),\n    \"api_key\": os.environ.get(\"AZURE_OPENAI_KEY\", \"\"),\n    \"azure_deployment\": os.environ.get(\"AZURE_OPENAI_DEPLOYMENT\", \"\"),\n}\n\n# Example for a simple AI-assisted quality evaluation\nrelevance_evaluator = RelevanceEvaluator(model_config=model_config)\n\n# For a conversation/turn based evaluation\n# result = relevance_evaluator(\n#     query=\"What is the capital of Japan?\",\n#     response=\"Tokyo is the capital of Japan.\"\n# )\n\n# For evaluating a dataset\ndata_for_evaluation = [\n    {\"id\": \"1\", \"query\": \"What is the capital of France?\", \"response\": \"Paris.\", \"context\": \"France is a country in Europe. Its capital is Paris.\"},\n    {\"id\": \"2\", \"query\": \"Who painted the Mona Lisa?\", \"response\": \"Leonardo da Vinci.\", \"context\": \"Leonardo da Vinci was an Italian polymath.\"}\n]\n\n# You can use `evaluate` function for batch evaluation on a dataset\n# Ensure you have a configured Azure AI Project if logging results to AI Studio\n# azure_ai_project = {\n#     \"subscription_id\": os.environ.get(\"AZURE_SUBSCRIPTION_ID\", \"\"),\n#     \"resource_group_name\": os.environ.get(\"AZURE_RESOURCE_GROUP\", \"\"),\n#     \"project_name\": os.environ.get(\"AZURE_AI_PROJECT_NAME\", \"\"),\n# }\n\n# results = evaluate(\n#     data=data_for_evaluation,\n#     evaluators=[relevance_evaluator],\n#     # azure_ai_project=azure_ai_project # Uncomment to log to AI Studio\n# )\n\nprint(\"Evaluators initialized. Ready for evaluation.\")\n","lang":"python","description":"This quickstart demonstrates how to initialize a `RelevanceEvaluator` with Azure OpenAI model configuration using environment variables. It outlines how to prepare data for evaluation and mentions the `evaluate` function for batch processing, with optional integration for logging results to an Azure AI Project. Ensure your Azure OpenAI endpoint, API key, and deployment name are set as environment variables."},"warnings":[{"fix":"Update environment variable name and adjust inputs for the specified evaluators according to the latest SDK documentation.","message":"Environment variable `PF_EVALS_BATCH_USE_ASYNC` was renamed to `AI_EVALS_BATCH_USE_ASYNC`. Input requirements for `RetrievalEvaluator`, `RelevanceEvaluator`, and `FluencyEvaluator` have changed.","severity":"breaking","affected_versions":"Potentially from v1.16.x or recent beta versions (e.g., 0.1.0b6153055 and earlier betas). Check changelog for precise version."},{"fix":"Review and update custom grader logic, ensuring correct function signatures (`def grade(sample, item): -> float`) and valid templating variables. Refer to the Azure AI Evaluation SDK troubleshooting guide for compatible OpenAI package versions or necessary adaptations.","message":"A breaking change in the OpenAI Python package (e.g., removal of `eval_string_check_grader` in v1.78.0) can cause compatibility issues and silent failures (returning zero scores) with Azure AI Evaluation SDK's custom graders like `AzureOpenAIPythonGrader`.","severity":"breaking","affected_versions":"OpenAI Python package >= 1.78.0, affecting Azure AI Evaluation SDK versions depending on when the breaking change was introduced."},{"fix":"Verify Azure OpenAI deployment capacity, confirm `DefaultAzureCredential` is correctly set up with the 'Azure AI User' role on the Foundry project, validate dataset JSONL format and field mappings, and implement retry logic with exponential backoff for rate limit errors.","message":"Evaluations can get stuck in 'Starting' or 'Running' state due to insufficient Azure OpenAI model capacity/quota, misconfigured authentication/permissions (e.g., missing 'Azure AI User' role for `DefaultAzureCredential`), incorrect dataset/mapping, or hitting rate limits.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Utilize project-level resources for evaluators in Azure AI Foundry (v2 SDK), defining them once for reuse across multiple agents and datasets. This separates how quality is measured from what is being evaluated.","message":"Embedding evaluation configuration directly within evaluation scripts can lead to 'configuration drift,' where different parts of the system measure metrics inconsistently, making historical comparisons unreliable.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Update your code to use the new environment variable `AI_EVALS_BATCH_USE_ASYNC` and remove `[remote]` extra from installation commands.","message":"The environment variable `PF_EVALS_BATCH_USE_ASYNC` was deprecated and renamed. The `[remote]` extra for installation has been removed as it's no longer needed when tracking results in Azure AI Studio.","severity":"deprecated","affected_versions":"Recent versions leading up to 1.16.x and beyond."},{"fix":"Upgrade to version 1.16.5 or later to mitigate this security vulnerability.","message":"Fixed Jinja2 Server-Side Template Injection (SSTI) vulnerability (CWE-1336) by replacing `jinja2.Template` with `jinja2.sandbox.SandboxedEnvironment` across all template rendering paths.","severity":"breaking","affected_versions":"<=1.16.4"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[{"fix":"Install the package using pip: 'pip install azure-ai-evaluation'.","cause":"The 'azure-ai-evaluation' package is not installed in the Python environment.","error":"ModuleNotFoundError: No module named 'azure.ai.evaluation'"},{"fix":"Refer to the official documentation to find the correct class or function to import for your use case.","cause":"The 'EvaluationClient' class does not exist in the 'azure.ai.evaluation' module.","error":"ImportError: cannot import name 'EvaluationClient' from 'azure.ai.evaluation'"},{"fix":"Replace 'max_tokens' with 'max_completion_tokens' in your code when configuring the model parameters.","cause":"The 'max_tokens' parameter is not supported by the specified model; it requires 'max_completion_tokens' instead.","error":"Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead"},{"fix":"Pin the `openai` package to a compatible version, typically an older version (e.g., `openai<=1.77.0`) that is known to work with your `azure-ai-evaluation` SDK version. You may also need to update `azure-ai-evaluation` to its latest version to ensure compatibility with newer `openai` versions.","cause":"This often occurs due to incompatible versions between the `azure-ai-evaluation` SDK and the `openai` Python package, especially after breaking changes in the `openai` library (e.g., version 1.78.0 and above).","error":"ModuleNotFoundError: No module named 'openai'"},{"fix":"Update your code to use the new generic evaluator names, such as `ViolenceEvaluator` instead of `ViolenceMultimodalEvaluator`.","cause":"Specific multimodal evaluator classes like `ViolenceMultimodalEvaluator` were removed or renamed in recent versions of the Azure AI Evaluation SDK (e.g., v1.3.0 and later) and replaced by generic counterparts (e.g., `ViolenceEvaluator`).","error":"ImportError: cannot import name 'ViolenceMultimodalEvaluator'"}]}