DeepEval

3.9.6 · active · verified Fri Apr 10

DeepEval is an LLM evaluation framework that helps developers evaluate any LLM workflow, from simple prompt chains to complex multi-step agents. It provides a suite of metrics for various evaluation aspects like relevancy, faithfulness, hallucination, and agentic task completion. Currently at version 3.9.6, the library maintains a frequent release cadence, often introducing new metrics, test case types, and developer experience improvements.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to define a basic text-based LLMTestCase, initialize an `AnswerRelevancyMetric`, and run an asynchronous evaluation using `deepeval.evaluate`. Remember to set an LLM API key (e.g., `OPENAI_API_KEY`) as an environment variable or directly in the code, as most metrics rely on an external LLM for their evaluation logic.

import os
import asyncio
from deepeval import evaluate
from deepeval.test_case import LLMTestCase
from deepeval.metrics import AnswerRelevancyMetric

# Configure your LLM API key (e.g., OpenAI, Cohere, etc.)
# Most metrics require an LLM to run. Replace with your actual key or set as env var.
# os.environ["OPENAI_API_KEY"] = os.environ.get('OPENAI_API_KEY', 'your_openai_api_key_here')

async def main():
    # Define a simple LLM test case
    test_case = LLMTestCase(
        input="What is the capital of France?",
        actual_output="Paris is the capital of France.",
        expected_output="Paris",
        context=["France is a country in Western Europe. Its capital is Paris."],
        retrieval_context=["Paris is known for the Eiffel Tower."]
    )

    # Initialize a metric, e.g., AnswerRelevancyMetric
    # Some metrics can take additional parameters or a specific LLM model.
    answer_relevancy_metric = AnswerRelevancyMetric(threshold=0.7)

    # Run the evaluation
    results = await evaluate(
        [test_case],
        metrics=[answer_relevancy_metric]
    )

    print("Evaluation Results:")
    for result in results:
        print(f"  Input: {result.input}")
        print(f"  Actual Output: {result.actual_output}")
        for m in result.metrics_results:
            print(f"  Metric: {m.metric_name}, Score: {m.score}, Pass: {m.success}")

if __name__ == '__main__':
    asyncio.run(main())

view raw JSON →