{"id":5114,"library":"arize-phoenix-evals","title":"Arize Phoenix Evals","description":"Phoenix Evals provides lightweight, composable building blocks for writing and running evaluations on LLM applications. It offers tools for determining relevance, toxicity, hallucination detection, and more. The library is actively developed, with version 3.0.0 being the current release, and features frequent updates as part of the broader Arize Phoenix ecosystem.","status":"active","version":"3.0.0","language":"en","source_language":"en","source_url":"https://github.com/Arize-ai/phoenix","tags":["LLM","evaluations","MLOps","AI/ML","observability"],"install":[{"cmd":"pip install arize-phoenix-evals","lang":"bash","label":"Core Evals"},{"cmd":"pip install 'arize-phoenix-evals>=2.0.0' openai","lang":"bash","label":"Evals with OpenAI LLM"}],"dependencies":[{"reason":"Required for LLM-based evaluators using OpenAI models.","package":"openai","optional":true},{"reason":"Used for interacting with a running Phoenix session (e.g., logging evaluation results or fetching trace data).","package":"arize-phoenix-client","optional":false}],"imports":[{"symbol":"create_classifier","correct":"from phoenix.evals import create_classifier"},{"note":"LLM class is specifically located in the phoenix.evals.llm submodule.","wrong":"from phoenix.evals import LLM","symbol":"LLM","correct":"from phoenix.evals.llm import LLM"},{"symbol":"evaluate_dataframe","correct":"from phoenix.evals import evaluate_dataframe"}],"quickstart":{"code":"import os\nfrom phoenix.evals import create_classifier\nfrom phoenix.evals.llm import LLM\n\n# Set your OpenAI API key from environment variable\nos.environ[\"OPENAI_API_KEY\"] = os.environ.get('OPENAI_API_KEY', 'sk-your-openai-key') # Replace with actual key or ensure env var is set\n\n# Create an LLM instance (ensure OPENAI_API_KEY is set in environment)\nllm = LLM(provider=\"openai\", model=\"gpt-4o\")\n\n# Create a custom classification evaluator\nevaluator = create_classifier(\n    name=\"helpfulness\",\n    prompt_template=\"Rate the response to the user query as helpful or not:\\n\\nQuery: {input}\\nResponse: {output}\",\n    llm=llm,\n    choices={\"helpful\": 1.0, \"not_helpful\": 0.0},\n)\n\n# Simple evaluation on a single record\nscores = evaluator.evaluate({\"input\": \"How do I reset the device?\", \"output\": \"Go to settings > reset.\"})\nprint(f\"Simple evaluation score: {scores[0].score}, label: {scores[0].label}\")\n\n# Evaluation with input mapping for nested data\nscores_nested = evaluator.evaluate(\n    {\"data\": {\"query\": \"How do I restart the app?\", \"response\": \"Close and reopen the application.\"}},\n    input_mapping={\"input\": \"data.query\", \"output\": \"data.response\"}\n)\nprint(f\"Nested evaluation score: {scores_nested[0].score}, label: {scores_nested[0].label}\")","lang":"python","description":"This quickstart demonstrates how to set up an LLM-based classification evaluator using the `arize-phoenix-evals` library with an OpenAI model. It covers defining an evaluator with a prompt template and performing evaluations on both simple and nested input data, showcasing input mapping."},"warnings":[{"fix":"Migrate to the new evaluation APIs and client-side experiments module. Refer to the official Phoenix migration guide for detailed instructions. If interacting with the Phoenix server directly, use the annotations API instead of `/v1/evaluations`.","message":"Version 3.0.0 of `arize-phoenix-evals` (and Phoenix v14.0.0) deprecates and removes the 'evals 1.0' module and the legacy experiments module. The `/v1/evaluations` REST endpoint has also been removed from the Phoenix server.","severity":"breaking","affected_versions":">=3.0.0"},{"fix":"Update your client instantiation from `import phoenix as px; client = px.Client(endpoint=...)` to `from phoenix.client import Client; client = Client(base_url=...)`. The `endpoint` parameter is now `base_url`.","message":"The legacy `phoenix.session.client.Client` (accessed as `px.Client()`) has been removed in Phoenix v14.0.0. All client interactions now go through `arize-phoenix-client`.","severity":"breaking","affected_versions":"arize-phoenix-evals>=3.0.0 (due to dependency on Phoenix v14.0.0+)"},{"fix":"Install the required LLM SDK, for example: `pip install openai>=1.0.0`.","message":"When using LLM-based evaluators, you must separately install the SDK for your chosen LLM vendor (e.g., `openai` for OpenAI models, `langchain` for LangChain integrations). `arize-phoenix-evals` does not bundle these dependencies.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Pass structured data directly to the evaluator; manual serialization is now handled.","message":"Starting with `arize-phoenix-evals` 2.12.0, evaluators automatically JSON-serialize structured data (dicts, lists) passed as template variable values. Manually `str()`-ing complex objects is no longer necessary and could lead to incorrect prompt rendering.","severity":"gotcha","affected_versions":">=2.12.0"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}