pyctcdecode

0.5.0 · active · verified Fri Apr 17

pyctcdecode is a Python library that provides a standalone beam search decoder for CTC (Connectionist Temporal Classification) models. It allows for efficient decoding of CTC outputs and seamlessly integrates with KenLM language models to improve speech recognition accuracy. The current version is 0.5.0, and it follows an active release cadence, with updates addressing features, performance, and bug fixes.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize `BeamSearchDecoderCTC` with a custom alphabet and decode dummy CTC logits. It shows the basic usage without a language model. For real-world applications, integrating with a KenLM language model is highly recommended for improved accuracy.

from pyctcdecode import BeamSearchDecoderCTC
from pyctcdecode.alphabet import BLANK_TOKEN, get_alphabet
import numpy as np

# Define your model's alphabet. The BLANK_TOKEN must be the first element.
# This example uses a common alphabet for English speech recognition.
labels = [BLANK_TOKEN] + list("abcdefghijklmnopqrstuvwxyz '")
alphabet = get_alphabet(labels)

# Create dummy CTC output (logits) for demonstration.
# In a real scenario, these would come from your deep learning model.
# Shape: (time_steps, alphabet_size)
time_steps = 50
logits = np.random.rand(time_steps, len(labels)).astype(np.float32)

# Initialize the decoder without a language model.
# For better accuracy, integrate with a KenLM language model (see warnings).
decoder = BeamSearchDecoderCTC(alphabet)

# Decode the logits. The decode method returns a list of hypotheses.
# We take the first (most probable) one.
hypotheses = decoder.decode(logits)
decoded_text = hypotheses[0]

print(f"Decoded text (example): {decoded_text}")

view raw JSON →