sacreBLEU

2.6.0 · active · verified Thu Apr 09

sacreBLEU is a Python library providing hassle-free computation of shareable, comparable, and reproducible BLEU, chrF, and TER scores for machine translation evaluation. It is actively maintained, with the current version being 2.6.0, and releases typically occur several times a year to add new test sets, tokenizers, or minor features.

Warnings

Install

Imports

Quickstart

Calculates the corpus BLEU score for a given hypothesis against one or more reference translations. Outputs the raw score and the formatted string.

import sacrebleu

# Example hypothesis and reference sentences
hypothesis = "The cat sat on the mat."
references = [
    "The cat is on the mat.",
    "A cat sat on the mat."
]

# Calculate corpus BLEU score
# Note: sacrebleu expects lists of sentences, even for a single hypothesis/reference
bleu_score = sacrebleu.corpus_bleu([hypothesis], [references])

print(f"BLEU score: {bleu_score.score:.2f}")
print(f"BLEU string: {bleu_score.format()}")

view raw JSON →