{"id":7689,"library":"rouge-metric","title":"ROUGE Metric","description":"A fast Python implementation of full ROUGE metrics for automatic summarization evaluation, also providing a Python wrapper for the official ROUGE-1.5.5.pl Perl script. It supports various ROUGE variants (N, L, W, S, SU) and multi-reference evaluation. The library is actively maintained with periodic updates.","status":"active","version":"1.0.1","language":"en","source_language":"en","source_url":"https://github.com/li-plus/rouge-metric","tags":["NLP","summarization","evaluation","metrics","text-generation"],"install":[{"cmd":"pip install rouge-metric","lang":"bash","label":"Install stable version from PyPI"},{"cmd":"pip install git+https://github.com/li-plus/rouge-metric.git@master","lang":"bash","label":"Install latest version from GitHub"}],"dependencies":[{"reason":"Required for using the PerlRouge wrapper, especially on Windows (e.g., Strawberry Perl).","package":"Perl","optional":true}],"imports":[{"note":"For the pure Python implementation of ROUGE.","symbol":"Rouge","correct":"from rouge_metric import Rouge"},{"note":"For the Python wrapper around the official ROUGE-1.5.5.pl Perl script.","symbol":"PerlRouge","correct":"from rouge_metric import PerlRouge"}],"quickstart":{"code":"from rouge_metric import Rouge\n\nhypothesis = 'The cat sat on the mat.'\nreference = 'The cat was on the mat.'\n\nrouge = Rouge()\nscores = rouge.evaluate(hypothesis, reference)\n\nprint(scores)\n\n# Example with multiple references (list of lists of tokens)\nhyp_tokens = ['the', 'cat', 'sat', 'on', 'the', 'mat']\nref1_tokens = ['the', 'cat', 'was', 'on', 'the', 'mat']\nref2_tokens = ['a', 'feline', 'was', 'resting', 'on', 'the', 'rug']\n\nscores_multi_ref = rouge.evaluate_from_tokens(hyp_tokens, [ref1_tokens, ref2_tokens])\nprint(scores_multi_ref)","lang":"python","description":"This quickstart demonstrates how to use the pure Python ROUGE implementation to evaluate a single hypothesis against a single reference string, and also how to evaluate tokenized input against multiple references. The output includes ROUGE-N, ROUGE-L, ROUGE-W, ROUGE-S, and ROUGE-SU scores."},"warnings":[{"fix":"Combine ROUGE scores with semantic similarity metrics (e.g., BERTScore) and qualitative human evaluation to get a more robust assessment of text quality.","message":"ROUGE metrics primarily rely on n-gram overlap and do not capture semantic meaning or contextual understanding. This can lead to high scores for syntactically similar but semantically divergent texts. It is recommended to complement ROUGE with other metrics (e.g., BERTScore) or human evaluation for a comprehensive assessment.","severity":"gotcha","affected_versions":"All versions"},{"fix":"If exact replication of ROUGE-1.5.5.pl multi-document scores is critical, use the `PerlRouge` wrapper or adjust expectations for slight differences in multi-document scenarios with `Rouge`.","message":"The multi-document evaluation results from the pure Python implementation (rouge_metric.Rouge) may be slightly different from those produced by the official ROUGE-1.5.5.pl Perl script (accessed via rouge_metric.PerlRouge) because the Python implementation does not use bootstrap resampling.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure consistent and appropriate preprocessing (tokenization, stemming, etc.) is applied to both hypothesis and reference texts before passing them to `rouge_metric.Rouge`.","message":"The pure Python implementation (rouge_metric.Rouge) expects pre-tokenized sentences (lists of tokens). Preprocessing steps like tokenization, stemming, and stopword removal are left to the client, which can impact scores if not handled consistently.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For non-English text evaluation, use `rouge_metric.Rouge` and ensure your input tokens are appropriately preprocessed for the target language. For English, `PerlRouge` can still be used for official ROUGE-1.5.5.pl compatibility.","message":"The `PerlRouge` wrapper, which calls the official ROUGE-1.5.5.pl script, is primarily intended for English corpora. For non-English summaries, it is recommended to use the pure Python implementation (`rouge_metric.Rouge`).","severity":"deprecated","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Install Strawberry Perl (on Windows) and ensure its binary folder is added to your system's PATH environment variable. Alternatively, use the pure Python implementation `rouge_metric.Rouge` if Perl script compatibility is not strictly required.","cause":"This error typically occurs when using `rouge_metric.PerlRouge` on a system (especially Windows) where the ROUGE-1.5.5.pl script or its Perl interpreter (e.g., Strawberry Perl) is not installed or not correctly added to the system's PATH.","error":"FileNotFoundError: [Errno 2] No such file or directory: 'ROUGE-1.5.5.pl'"},{"fix":"Run `pip install rouge-metric` in your terminal to install the library.","cause":"The `rouge-metric` package is not installed in your current Python environment.","error":"ModuleNotFoundError: No module named 'rouge_metric'"},{"fix":"Ensure you pass both the `hypothesis` and `reference` strings (or tokenized lists for `evaluate_from_tokens`) to the evaluation method, e.g., `rouge.evaluate(hypothesis_text, reference_text)`.","cause":"The `evaluate` method of the `Rouge` class requires both a hypothesis and a reference string as input.","error":"TypeError: evaluate() missing 1 required positional argument: 'reference'"}]}