Jiwer
Jiwer is a simple and fast Python package designed to evaluate Automatic Speech Recognition (ASR) systems. It computes similarity measures such as Word Error Rate (WER), Match Error Rate (MER), Word Information Lost (WIL), Word Information Preserved (WIP), and Character Error Rate (CER). It uses RapidFuzz, which leverages C++ under the hood, for efficient minimum-edit distance calculations, making it faster than pure Python implementations. The current version is 4.0.0, released in June 2025, and it maintains an active development and release cadence.
Warnings
- breaking The functions `jiwer.compute_measures()` and `jiwer.visualize_measures()` were renamed in version 4.0.0. They are now `jiwer.process_words()` and `jiwer.visualize_alignment()` respectively. Additionally, `process_words` returns a `WordOutput` dataclass instead of a dictionary.
- breaking The behavior for handling empty reference sentences changed in version 4.0.0. Previously, an empty reference with an empty hypothesis could lead to undefined behavior (division by zero). As of 4.0.0, this scenario is explicitly defined to yield zero error, supporting evaluation for models hallucinating on silent audio.
- breaking The internal representation of alignment chunks changed in version 4.0.0. Alignments are now returned as a list of `jiwer.AlignmentChunk` dataclass objects, replacing the previous tuple-based format. This improves clarity and accessibility of alignment details.
- gotcha Word Error Rate (WER) can exceed 100% (or 1.0). This occurs when the total number of errors (substitutions, deletions, and insertions) is greater than the number of words in the reference text, often due to a high number of insertions by the ASR system.
- gotcha When `jiwer.wer()` or `jiwer.cer()` are provided with lists of reference and hypothesis sentences, they internally concatenate all sentences to compute a *single, global* error rate for the entire dataset, which is standard for corpus-level evaluation. It does not return individual error rates per sentence.
- gotcha Jiwer calculates metrics on the raw input strings. For robust and fair evaluation, it is crucial to apply consistent text normalization (e.g., lowercasing, punctuation removal, expansion of contractions) to both reference and hypothesis strings *before* calling jiwer functions.
Install
-
pip install jiwer
Imports
- wer
from jiwer import wer
- cer
from jiwer import cer
- process_words
import jiwer output = jiwer.process_words(reference, hypothesis)
- process_characters
import jiwer output = jiwer.process_characters(reference, hypothesis)
- Compose
from jiwer import Compose
Quickstart
import jiwer
# Calculate Word Error Rate (WER) for single strings
reference_single = "hello world"
hypothesis_single = "hello duck"
error_single = jiwer.wer(reference_single, hypothesis_single)
print(f"WER (single): {error_single}")
# Calculate WER for multiple sentences (lists of strings)
references_multiple = ["hello world", "i like monthy python"]
hypotheses_multiple = ["hello duck", "i like python"]
error_multiple = jiwer.wer(references_multiple, hypotheses_multiple)
print(f"WER (multiple): {error_multiple}")
# Get detailed output including alignments and all measures
output_details = jiwer.process_words(reference_single, hypothesis_single)
print(f"\nDetailed output WER: {output_details.wer}")
print(f"Detailed output MER: {output_details.mer}")
print(f"Alignments: {output_details.alignments}")