pytrec-eval-terrier
pytrec-eval-terrier provides Python bindings for common Information Retrieval evaluation measures, leveraging the highly optimized `trec_eval` C library. It simplifies the process of evaluating ranking performance for search systems. The current version is 0.5.10, and releases occur periodically, often tied to Python version support or minor bug fixes.
Warnings
- breaking Minimum required Python version was upgraded to 3.8. Users on Python 3.7 or older will encounter errors or build failures with versions 0.5.7 and newer.
- gotcha The PyPI package name is `pytrec-eval-terrier`, but the Python module to import is `pytrec_eval`.
- gotcha When installing on systems without pre-built wheels (e.g., specific Linux distributions, older Python versions, or ARM architectures), compilation from source may require a C/C++ compiler (like GCC or Clang) and Python development headers (`python3-dev` or similar).
- gotcha Input QRELs and runs must adhere to specific dictionary formats: `{query_id: {doc_id: score}}`. Incorrectly structured inputs will lead to evaluation errors or unexpected results.
Install
-
pip install pytrec-eval-terrier
Imports
- pytrec_eval
import pytrec_eval
- RelevanceEvaluator
from pytrec_eval import RelevanceEvaluator
Quickstart
import pytrec_eval
# Example QRELS (Query Relevance Judgments)
# Format: {query_id: {doc_id: relevance_score}}
qrels = {
'q1': {'d1': 1, 'd2': 0, 'd3': 1},
'q2': {'d4': 1, 'd5': 0}
}
# Example RUNS (System Rankings)
# Format: {query_id: {doc_id: score}}
runs = {
'q1': {'d1': 0.9, 'd2': 0.8, 'd4': 0.7},
'q2': {'d4': 0.95, 'd6': 0.85}
}
# Define measures to evaluate
measures = pytrec_eval.supported_measures
# Or a specific set: measures = {'map', 'ndcg_cut.10', 'recip_rank'}
# Instantiate the evaluator
evaluator = pytrec_eval.RelevanceEvaluator(qrels, measures)
# Evaluate the runs
results = evaluator.evaluate(runs)
# Print results for a specific query and measure
print(f"MAP for q1: {results['q1']['map']:.4f}")
print(f"NDCG@10 for q2: {results['q2']['ndcg_cut_10']:.4f}")
# Print average results across all queries
agg_results = pytrec_eval.compute_aggregated_results(evaluator, results, measures)
print(f"Average MAP: {agg_results['map']:.4f}")