{"id":6470,"library":"texterrors","title":"Texterrors WER/CER Scoring Tool","description":"texterrors is a Python library and command-line tool designed to score ASR (Automatic Speech Recognition) or transcription output against a reference. It provides metrics such as Word Error Rate (WER) and Character Error Rate (CER), supports standard and character-aware alignment, generates detailed error reports, and can produce colored output for inspection. The library also features comparison of multiple hypothesis files, per-group metrics (e.g., per-speaker WER), keyword and OOV (Out-Of-Vocabulary) evaluation, oracle WER, and simple entity accuracy. It aims to be an easy-to-use, modify, and extend alternative to older tools like `sclite`. The current version is 1.1.6, and it receives active maintenance and frequent updates.","status":"active","version":"1.1.6","language":"en","source_language":"en","source_url":"https://github.com/RuABraun/texterrors","tags":["WER","CER","ASR","transcription","NLP","alignment","metrics","speech-recognition"],"install":[{"cmd":"pip install texterrors","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"Required Python version.","package":"python","version":">=3.9"},{"reason":"A dependency for advanced regular expression operations, which was explicitly fixed in a past release.","package":"regex","optional":false}],"imports":[{"note":"This is the primary function for programmatic text alignment and error rate calculation.","symbol":"align_texts","correct":"from texterrors import align_texts"}],"quickstart":{"code":"from texterrors import align_texts\n\n# Reference and hypothesis texts as lists of tokens\nreference_tokens = [\"this\", \"is\", \"a\", \"test\", \"sentence\"]\nhypothesis_tokens = [\"this\", \"is\", \"test\", \"sentance\"]\n\n# Perform alignment and get metrics\n# The return_detailed_json=True flag provides comprehensive results.\n# For full features (like colored output, group metrics, entity accuracy),\n# the command-line interface is often more direct.\nresult = align_texts(\n    reference_tokens,\n    hypothesis_tokens,\n    ref_id=\"ref_utt_1\",  # Optional: utterance ID\n    hyp_id=\"hyp_utt_1\",  # Optional: utterance ID\n    return_detailed_json=True # Get comprehensive results\n)\n\nprint(f\"WER: {result['wer']:.2f}%\")\nprint(f\"CER: {result['cer']:.2f}%\")\nprint(f\"Substitutions: {result['substitutions']}\")\nprint(f\"Deletions: {result['deletions']}\")\nprint(f\"Insertions: {result['ins']}\")\n\nprint(\"\\nFirst 10 alignment details (token level):\")\nfor item in result['aligned_tokens'][:10]:\n    print(item)","lang":"python","description":"This quickstart demonstrates how to use the `align_texts` function to calculate WER and CER programmatically. It takes tokenized reference and hypothesis lists and returns a detailed JSON object with various metrics. For more complex features such as colored output, per-group analysis, or simple entity accuracy, the `texterrors` command-line tool is often more convenient and directly supported."},"warnings":[{"fix":"Migrate any usage of `weighted WER` to use the new `simple entity accuracy` features if applicable, or remove the deprecated functionality.","message":"The `weighted WER` feature was removed, and `simple entity accuracy` was introduced in its place. Code relying on `weighted WER` will break.","severity":"breaking","affected_versions":"<1.1.6"},{"fix":"Ensure your development environment has `nanobind` and `CMake` properly configured if building from source. For most users installing via `pip install texterrors`, pre-built wheels should handle this transparently.","message":"The extension module (handling core performance-critical parts) was migrated from `pybind11` to `nanobind` and the build system moved to `CMake/scikit-build-core`. This primarily affects users building `texterrors` from source or in complex C++/Python environments.","severity":"breaking","affected_versions":"<1.1.6"},{"fix":"Be aware of this difference when comparing results. If precise parity with traditional tools is required, ensure character-aware alignment is explicitly disabled, or understand the implications if it's enabled.","message":"Character-aware alignment, while a powerful feature, can result in a slightly higher WER compared to traditional tools (e.g., Kaldi's `sclite`) due to its more granular alignment strategy. Since 22.06.22, character-aware alignment is *off* by default.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Pipe the output to `less -R` (e.g., `texterrors -c ref.txt hyp.txt | less -R`).","message":"When using the command-line tool's colored output (`-c` flag), the output should be viewed with a pager like `less -R` to correctly interpret ANSI escape codes and display colors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always check the `--help` output for the correct input flags (`texterrors -h`) and use them as appropriate for your data format.","message":"The command-line tool expects specific input file formats. For files where each line starts with an utterance ID, the `--isark` flag is required. For CTM-like input including timing fields, `--isctm` is needed.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z"}