Texterrors WER/CER Scoring Tool
texterrors is a Python library and command-line tool designed to score ASR (Automatic Speech Recognition) or transcription output against a reference. It provides metrics such as Word Error Rate (WER) and Character Error Rate (CER), supports standard and character-aware alignment, generates detailed error reports, and can produce colored output for inspection. The library also features comparison of multiple hypothesis files, per-group metrics (e.g., per-speaker WER), keyword and OOV (Out-Of-Vocabulary) evaluation, oracle WER, and simple entity accuracy. It aims to be an easy-to-use, modify, and extend alternative to older tools like `sclite`. The current version is 1.1.6, and it receives active maintenance and frequent updates.
Warnings
- breaking The `weighted WER` feature was removed, and `simple entity accuracy` was introduced in its place. Code relying on `weighted WER` will break.
- breaking The extension module (handling core performance-critical parts) was migrated from `pybind11` to `nanobind` and the build system moved to `CMake/scikit-build-core`. This primarily affects users building `texterrors` from source or in complex C++/Python environments.
- gotcha Character-aware alignment, while a powerful feature, can result in a slightly higher WER compared to traditional tools (e.g., Kaldi's `sclite`) due to its more granular alignment strategy. Since 22.06.22, character-aware alignment is *off* by default.
- gotcha When using the command-line tool's colored output (`-c` flag), the output should be viewed with a pager like `less -R` to correctly interpret ANSI escape codes and display colors.
- gotcha The command-line tool expects specific input file formats. For files where each line starts with an utterance ID, the `--isark` flag is required. For CTM-like input including timing fields, `--isctm` is needed.
Install
-
pip install texterrors
Imports
- align_texts
from texterrors import align_texts
Quickstart
from texterrors import align_texts
# Reference and hypothesis texts as lists of tokens
reference_tokens = ["this", "is", "a", "test", "sentence"]
hypothesis_tokens = ["this", "is", "test", "sentance"]
# Perform alignment and get metrics
# The return_detailed_json=True flag provides comprehensive results.
# For full features (like colored output, group metrics, entity accuracy),
# the command-line interface is often more direct.
result = align_texts(
reference_tokens,
hypothesis_tokens,
ref_id="ref_utt_1", # Optional: utterance ID
hyp_id="hyp_utt_1", # Optional: utterance ID
return_detailed_json=True # Get comprehensive results
)
print(f"WER: {result['wer']:.2f}%")
print(f"CER: {result['cer']:.2f}%")
print(f"Substitutions: {result['substitutions']}")
print(f"Deletions: {result['deletions']}")
print(f"Insertions: {result['ins']}")
print("\nFirst 10 alignment details (token level):")
for item in result['aligned_tokens'][:10]:
print(item)