ROUGE Metric

1.0.1 · active · verified Thu Apr 16

A fast Python implementation of full ROUGE metrics for automatic summarization evaluation, also providing a Python wrapper for the official ROUGE-1.5.5.pl Perl script. It supports various ROUGE variants (N, L, W, S, SU) and multi-reference evaluation. The library is actively maintained with periodic updates.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use the pure Python ROUGE implementation to evaluate a single hypothesis against a single reference string, and also how to evaluate tokenized input against multiple references. The output includes ROUGE-N, ROUGE-L, ROUGE-W, ROUGE-S, and ROUGE-SU scores.

from rouge_metric import Rouge

hypothesis = 'The cat sat on the mat.'
reference = 'The cat was on the mat.'

rouge = Rouge()
scores = rouge.evaluate(hypothesis, reference)

print(scores)

# Example with multiple references (list of lists of tokens)
hyp_tokens = ['the', 'cat', 'sat', 'on', 'the', 'mat']
ref1_tokens = ['the', 'cat', 'was', 'on', 'the', 'mat']
ref2_tokens = ['a', 'feline', 'was', 'resting', 'on', 'the', 'rug']

scores_multi_ref = rouge.evaluate_from_tokens(hyp_tokens, [ref1_tokens, ref2_tokens])
print(scores_multi_ref)

view raw JSON →