ROUGE Score

0.1.2 · active · verified Fri Apr 10

The `rouge-score` library is a pure Python implementation of the ROUGE-1.5.5 evaluation metric, designed to closely replicate the results of the original Perl script. It provides functionalities for calculating ROUGE-N, ROUGE-L (sentence-level and summary-level), text normalization, and optional Porter stemming. The library is currently at version 0.1.2, released in July 2022, and while the version updates are infrequent, it remains an actively used and stable package maintained by Google for evaluating text generation tasks like summarization.

Warnings

Install

Imports

Quickstart

This example demonstrates how to initialize `RougeScorer` for common ROUGE types (ROUGE-1, ROUGE-2, ROUGE-L, ROUGE-Lsum) with stemming enabled. It then computes and prints the precision, recall, and F1-score for a given reference and candidate text. Ensure `nltk` data (like 'punkt' and 'wordnet') is available if `use_stemmer=True`.

from rouge_score import rouge_scorer

# Initialize the scorer with desired ROUGE types and optional stemming
scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL', 'rougeLsum'], use_stemmer=True)

# Define the reference (target) and candidate (prediction) summaries
reference_summary = "The quick brown fox jumps over the lazy dog. It's a sunny day."
candidate_summary = "A quick brown fox leaps over a sleeping dog. The weather is nice."

# Calculate scores
scores = scorer.score(reference_summary, candidate_summary)

# Print the results for each ROUGE type (precision, recall, f-measure)
for key, value in scores.items():
    print(f"{key}:")
    print(f"  Precision: {value.precision:.4f}")
    print(f"  Recall: {value.recall:.4f}")
    print(f"  F1 Score: {value.fmeasure:.4f}")

view raw JSON →