Texterrors WER/CER Scoring Tool

1.1.6 · active · verified Wed Apr 15

texterrors is a Python library and command-line tool designed to score ASR (Automatic Speech Recognition) or transcription output against a reference. It provides metrics such as Word Error Rate (WER) and Character Error Rate (CER), supports standard and character-aware alignment, generates detailed error reports, and can produce colored output for inspection. The library also features comparison of multiple hypothesis files, per-group metrics (e.g., per-speaker WER), keyword and OOV (Out-Of-Vocabulary) evaluation, oracle WER, and simple entity accuracy. It aims to be an easy-to-use, modify, and extend alternative to older tools like `sclite`. The current version is 1.1.6, and it receives active maintenance and frequent updates.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use the `align_texts` function to calculate WER and CER programmatically. It takes tokenized reference and hypothesis lists and returns a detailed JSON object with various metrics. For more complex features such as colored output, per-group analysis, or simple entity accuracy, the `texterrors` command-line tool is often more convenient and directly supported.

from texterrors import align_texts

# Reference and hypothesis texts as lists of tokens
reference_tokens = ["this", "is", "a", "test", "sentence"]
hypothesis_tokens = ["this", "is", "test", "sentance"]

# Perform alignment and get metrics
# The return_detailed_json=True flag provides comprehensive results.
# For full features (like colored output, group metrics, entity accuracy),
# the command-line interface is often more direct.
result = align_texts(
    reference_tokens,
    hypothesis_tokens,
    ref_id="ref_utt_1",  # Optional: utterance ID
    hyp_id="hyp_utt_1",  # Optional: utterance ID
    return_detailed_json=True # Get comprehensive results
)

print(f"WER: {result['wer']:.2f}%")
print(f"CER: {result['cer']:.2f}%")
print(f"Substitutions: {result['substitutions']}")
print(f"Deletions: {result['deletions']}")
print(f"Insertions: {result['ins']}")

print("\nFirst 10 alignment details (token level):")
for item in result['aligned_tokens'][:10]:
    print(item)

view raw JSON →