Kaldi Alignment Methods

0.9.3 · active · verified Thu Apr 16

kaldialign is a Python package that provides wrappers for Kaldi's core alignment and edit distance computation functions. It directly incorporates Kaldi's C++ code via pybind11 to ensure identical Word Error Rate (WER) and alignment results as Kaldi, addressing inconsistencies found in other Levenshtein implementations. Currently at version 0.9.3, it receives regular updates focusing on performance enhancements and broader platform compatibility through pre-built wheels.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates the core `align` and `edit_distance` functions, as well as the `bootstrap_wer_ci` method for calculating Word Error Rate confidence intervals. The `epsilon_symbol` for `align` must be a character not present in either sequence to represent insertions/deletions.

from kaldialign import align, edit_distance, bootstrap_wer_ci

# Example 1: Basic Alignment
ref_seq = ['hello', 'world', 'this', 'is', 'a', 'test']
hyp_seq = ['hello', 'this', 'was', 'a', 'test']
epsilon_symbol = '*'

alignment = align(ref_seq, hyp_seq, epsilon_symbol)
print(f"Alignment: {alignment}")
# Expected: [('hello', 'hello'), ('world', '*'), ('*', 'this'), ('this', 'was'), ('is', '*'), ('a', 'a'), ('test', 'test')]

# Example 2: Edit Distance (Insertions, Deletions, Substitutions)
ed_results = edit_distance(ref_seq, hyp_seq)
print(f"Edit Distance: {ed_results}")
# Expected: {'ins': 1, 'del': 2, 'sub': 1, 'total': 4}

# Example 3: Bootstrapping WER Confidence Intervals (since v0.8.0)
# ref and hyp should be lists of lists of words (representing sentences)
# E.g., [['sentence', 'one'], ['sentence', 'two']] and [['sentece', 'one'], ['sentence', 'too']]
ref_utterances = [['the', 'quick', 'brown', 'fox'], ['jumps', 'over', 'the', 'lazy', 'dog']]
hyp_utterances = [['the', 'quik', 'brown', 'fox'], ['jump', 'over', 'the', 'lazi', 'dog']]

wer_ci = bootstrap_wer_ci(ref_utterances, hyp_utterances)
print(f"WER 95% Confidence Interval: {wer_ci}")

view raw JSON →