{"id":8253,"library":"kaldialign","title":"Kaldi Alignment Methods","description":"kaldialign is a Python package that provides wrappers for Kaldi's core alignment and edit distance computation functions. It directly incorporates Kaldi's C++ code via pybind11 to ensure identical Word Error Rate (WER) and alignment results as Kaldi, addressing inconsistencies found in other Levenshtein implementations. Currently at version 0.9.3, it receives regular updates focusing on performance enhancements and broader platform compatibility through pre-built wheels.","status":"active","version":"0.9.3","language":"en","source_language":"en","source_url":"https://github.com/pzelasko/kaldialign","tags":["speech-recognition","kaldi","alignment","edit-distance","wer","pybind11"],"install":[{"cmd":"pip install kaldialign","lang":"bash","label":"Install latest version"}],"dependencies":[],"imports":[{"symbol":"align","correct":"from kaldialign import align"},{"symbol":"edit_distance","correct":"from kaldialign import edit_distance"},{"symbol":"bootstrap_wer_ci","correct":"from kaldialign import bootstrap_wer_ci"}],"quickstart":{"code":"from kaldialign import align, edit_distance, bootstrap_wer_ci\n\n# Example 1: Basic Alignment\nref_seq = ['hello', 'world', 'this', 'is', 'a', 'test']\nhyp_seq = ['hello', 'this', 'was', 'a', 'test']\nepsilon_symbol = '*'\n\nalignment = align(ref_seq, hyp_seq, epsilon_symbol)\nprint(f\"Alignment: {alignment}\")\n# Expected: [('hello', 'hello'), ('world', '*'), ('*', 'this'), ('this', 'was'), ('is', '*'), ('a', 'a'), ('test', 'test')]\n\n# Example 2: Edit Distance (Insertions, Deletions, Substitutions)\ned_results = edit_distance(ref_seq, hyp_seq)\nprint(f\"Edit Distance: {ed_results}\")\n# Expected: {'ins': 1, 'del': 2, 'sub': 1, 'total': 4}\n\n# Example 3: Bootstrapping WER Confidence Intervals (since v0.8.0)\n# ref and hyp should be lists of lists of words (representing sentences)\n# E.g., [['sentence', 'one'], ['sentence', 'two']] and [['sentece', 'one'], ['sentence', 'too']]\nref_utterances = [['the', 'quick', 'brown', 'fox'], ['jumps', 'over', 'the', 'lazy', 'dog']]\nhyp_utterances = [['the', 'quik', 'brown', 'fox'], ['jump', 'over', 'the', 'lazi', 'dog']]\n\nwer_ci = bootstrap_wer_ci(ref_utterances, hyp_utterances)\nprint(f\"WER 95% Confidence Interval: {wer_ci}\")","lang":"python","description":"This quickstart demonstrates the core `align` and `edit_distance` functions, as well as the `bootstrap_wer_ci` method for calculating Word Error Rate confidence intervals. The `epsilon_symbol` for `align` must be a character not present in either sequence to represent insertions/deletions."},"warnings":[{"fix":"Choose an `epsilon` symbol (like `*` or `_`) that is unique and known not to exist in your reference or hypothesis sequences.","message":"When using `align()`, the `epsilon` parameter must be a 'null' symbol (e.g., `'*'`) that is guaranteed not to appear in any of the input sequence elements (words/characters). If `epsilon` is present in the sequences, it can lead to incorrect alignment results.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Be aware that the implementation is now C++-based. Rely on the provided Python interface for its functionality.","message":"The `bootstrap_wer_ci` function, introduced in v0.8.0 and significantly optimized in v0.9.0, now performs its core computation in C++ for a 15x speed improvement. While this boosts performance, it means that the underlying logic for bootstrapping cannot be directly inspected or modified via Python.","severity":"gotcha","affected_versions":">=0.9.0"},{"fix":"If `sclite`-style scoring is required, ensure your function call includes `sclite_mode=True`, e.g., `edit_distance(ref, hyp, sclite_mode=True)`.","message":"For computing WER or alignments using `sclite` style weights (insertion/deletion cost 3, substitution cost 4), you must explicitly pass `sclite_mode=True` to the `align()` or `edit_distance()` functions. The default behavior aligns with standard Kaldi weights.","severity":"gotcha","affected_versions":">=0.7.0"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"For most users, `pip install kaldialign` should download a pre-built wheel, avoiding compilation. If building from source is necessary, ensure you have CMake and a C++ compiler (like GCC or Clang) installed and accessible in your system's PATH.","cause":"This error typically occurs when attempting to install `kaldialign` from source without a properly configured C++ compiler and CMake build system. A specific bug related to `CMakeLists.txt` not being found was fixed in v0.4.0, but general build environment issues can still lead to this.","error":"CMake Error at CMakeLists.txt:..."},{"fix":"Try reinstalling the package in a clean virtual environment: `python -m venv .venv && source .venv/bin/activate && pip install --no-cache-dir kaldialign`. If errors persist, check the full installation log for compilation failures.","cause":"This indicates that the `kaldialign` Python package, or its underlying C++ extensions, were not correctly installed or are not accessible in your current Python environment. This can happen due to partial installations, conflicting packages, or environment issues.","error":"ImportError: cannot import name 'align' from 'kaldialign'"}]}