Ragas

0.4.3 · active · verified Wed Mar 25

RAG evaluation framework — measures faithfulness, answer relevancy, context precision/recall and more. Current version: 0.4.3 (Mar 2026). Still pre-1.0. v0.2 was a major breaking change from v0.1: metrics are now class instances initialized with LLM, evaluate() takes EvaluationDataset not HuggingFace Dataset, answer_relevancy renamed to ResponseRelevancy, fields renamed (question→user_input, answer→response, contexts→retrieved_contexts). Legacy API still works but deprecated — will be removed in v1.0.

Warnings

Install

Imports

Quickstart

Ragas v0.2+ RAG evaluation with EvaluationDataset and class-based metrics.

# pip install ragas langchain-openai
from ragas import EvaluationDataset, SingleTurnSample, evaluate
from ragas.metrics import Faithfulness, ResponseRelevancy, LLMContextRecall
from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI
import os

os.environ['OPENAI_API_KEY'] = 'your-key'

llm = LangchainLLMWrapper(ChatOpenAI(model='gpt-4o-mini'))

samples = [
    SingleTurnSample(
        user_input='What is the capital of France?',
        response='The capital of France is Paris.',
        retrieved_contexts=['Paris is the capital and most populous city of France.'],
        reference='Paris'  # ground truth — needed for recall
    )
]

dataset = EvaluationDataset(samples=samples)

result = evaluate(
    dataset,
    metrics=[
        Faithfulness(llm=llm),
        ResponseRelevancy(llm=llm),
        LLMContextRecall(llm=llm)
    ]
)
print(result)
# {'faithfulness': 1.0, 'response_relevancy': 0.97, 'context_recall': 1.0}

view raw JSON →