{"id":2468,"library":"deepeval","title":"DeepEval","description":"DeepEval is an LLM evaluation framework that helps developers evaluate any LLM workflow, from simple prompt chains to complex multi-step agents. It provides a suite of metrics for various evaluation aspects like relevancy, faithfulness, hallucination, and agentic task completion. Currently at version 3.9.6, the library maintains a frequent release cadence, often introducing new metrics, test case types, and developer experience improvements.","status":"active","version":"3.9.6","language":"en","source_language":"en","source_url":"https://github.com/confident-ai/deepeval","tags":["LLM evaluation","AI agents","metrics","observability","testing","rag"],"install":[{"cmd":"pip install deepeval","lang":"bash","label":"Install DeepEval"}],"dependencies":[],"imports":[{"symbol":"evaluate","correct":"from deepeval import evaluate"},{"symbol":"LLMTestCase","correct":"from deepeval.test_case import LLMTestCase"},{"note":"For conversational test cases, DeepEval v3.0.8+ requires `list[Turn]` instead of a `list[LLMTestCase]`.","wrong":"list[LLMTestCase]","symbol":"Turn","correct":"from deepeval.test_case import Turn"},{"note":"Most metrics follow this import pattern.","symbol":"AnswerRelevancyMetric","correct":"from deepeval.metrics import AnswerRelevancyMetric"}],"quickstart":{"code":"import os\nimport asyncio\nfrom deepeval import evaluate\nfrom deepeval.test_case import LLMTestCase\nfrom deepeval.metrics import AnswerRelevancyMetric\n\n# Configure your LLM API key (e.g., OpenAI, Cohere, etc.)\n# Most metrics require an LLM to run. Replace with your actual key or set as env var.\n# os.environ[\"OPENAI_API_KEY\"] = os.environ.get('OPENAI_API_KEY', 'your_openai_api_key_here')\n\nasync def main():\n    # Define a simple LLM test case\n    test_case = LLMTestCase(\n        input=\"What is the capital of France?\",\n        actual_output=\"Paris is the capital of France.\",\n        expected_output=\"Paris\",\n        context=[\"France is a country in Western Europe. Its capital is Paris.\"],\n        retrieval_context=[\"Paris is known for the Eiffel Tower.\"]\n    )\n\n    # Initialize a metric, e.g., AnswerRelevancyMetric\n    # Some metrics can take additional parameters or a specific LLM model.\n    answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.7)\n\n    # Run the evaluation\n    results = await evaluate(\n        [test_case],\n        metrics=[answer_relevancy_metric]\n    )\n\n    print(\"Evaluation Results:\")\n    for result in results:\n        print(f\"  Input: {result.input}\")\n        print(f\"  Actual Output: {result.actual_output}\")\n        for m in result.metrics_results:\n            print(f\"  Metric: {m.metric_name}, Score: {m.score}, Pass: {m.success}\")\n\nif __name__ == '__main__':\n    asyncio.run(main())\n","lang":"python","description":"This quickstart demonstrates how to define a basic text-based LLMTestCase, initialize an `AnswerRelevancyMetric`, and run an asynchronous evaluation using `deepeval.evaluate`. Remember to set an LLM API key (e.g., `OPENAI_API_KEY`) as an environment variable or directly in the code, as most metrics rely on an external LLM for their evaluation logic."},"warnings":[{"fix":"Migrate your conversational test cases from a list of `LLMTestCase` objects to a list of `Turn` objects, where each `Turn` has a `role` (e.g., 'user', 'assistant') and `content` attribute.","message":"Breaking change in v3.0.8: Conversational test cases must now use a `list[Turn]` instead of `list[LLMTestCase]`.","severity":"breaking","affected_versions":">=3.0.8"},{"fix":"Review the official DeepEval v3.0 migration guide and documentation. Existing evaluation setups for multi-step or agentic workflows may require substantial refactoring due to the introduction of 'component-level granularity'.","message":"Major API overhaul in v3.0: Significant changes for defining complex LLM workflows and agents.","severity":"breaking","affected_versions":">=3.0.0"},{"fix":"Carefully select the appropriate `TestCase` type based on your evaluation need: `LLMTestCase` for single-turn text, `Turn` for conversational turns, `MLLMTestCase` for multimodal inputs, and `ArenaTestCase` for pairwise comparisons.","message":"DeepEval provides multiple `TestCase` types (`LLMTestCase`, `MLLMTestCase`, `ArenaTestCase`, `Turn`). Using the incorrect `TestCase` type for a specific evaluation scenario (e.g., `LLMTestCase` for multi-turn conversations after v3.0.8) is a common error.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure the necessary LLM API key is configured as an environment variable (e.g., `os.environ[\"OPENAI_API_KEY\"] = \"...\"`) or passed explicitly via the `model` parameter when initializing metrics or running `evaluate`.","message":"Most DeepEval metrics rely on an underlying Large Language Model (LLM) for their evaluation logic, requiring an API key (e.g., `OPENAI_API_KEY`, `COHERE_API_KEY`) to be set.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-10T00:00:00.000Z","next_check":"2026-07-09T00:00:00.000Z"}