Python ROUGE Score Implementation
The 'rouge' library provides a full, native Python implementation of the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metric, used for evaluating automatic text summarization and machine translation. Unlike some other ROUGE packages, it is not a wrapper around the original Perl script. The current stable version is 1.0.1, with releases occurring periodically to introduce features and fixes.
Warnings
- gotcha This 'rouge' library (pltrdy/rouge) is a native Python implementation and explicitly states its results may be slightly different from the 'official' ROUGE-155 Perl script. If exact replication of Perl ROUGE-155 is critical, or if using a different Python ROUGE implementation (like 'rouge-score' by Google) results in discrepancies, this is the expected behavior for this package.
- gotcha There are multiple Python libraries for ROUGE, notably `rouge` (this package, from pltrdy) and `rouge-score` (from Google Research). They have different import paths and API interfaces. Confusing them is a common footgun.
- gotcha The library is language-agnostic and expects tokenized input. For optimal results, especially with non-English texts or specific NLP tasks, users should pre-process and tokenize the hypothesis and reference texts before passing them to the `get_scores` method.
- deprecated Prior to version 0.3, there was an error in ROUGE-L calculation when handling sequences with multiple sentences. This was fixed in version 0.3.
Install
-
pip install rouge
Imports
- Rouge
from rouge import Rouge
Quickstart
from rouge import Rouge hypothesis = "the cat sat on the mat" reference = "the cat is on the mat" rouge = Rouge() scores = rouge.get_scores(hypothesis, reference) print(scores)