BERTScore

0.3.13 · active · verified Sat Apr 11

BERTScore is a Python library that provides a PyTorch implementation of the BERTScore metric, a robust evaluation metric for text generation tasks. It leverages pre-trained BERT embeddings to compute a similarity score between generated and reference texts, addressing limitations of traditional metrics like BLEU. The library is actively maintained with regular updates and is currently at version 0.3.13.

Warnings

Install

Imports

Quickstart

This example calculates BERTScore (Precision, Recall, and F1) between a list of candidate sentences and reference sentences. The `lang` parameter specifies the language model to use, and `verbose=True` provides detailed output during computation.

from bert_score import score

cands = ["The cat sat on the mat.", "The dog ate the food."]
refs = [["The cat was on the mat."], ["A dog consumed the meal."]]

P, R, F1 = score(cands, refs, lang="en", verbose=True)

print(f"Precision: {P.mean().item():.3f}")
print(f"Recall: {R.mean().item():.3f}")
print(f"F1 Score: {F1.mean().item():.3f}")

view raw JSON →