Kaldi Native Fbank
Kaldi-native-fbank is a Python library providing a Kaldi-compatible online filter bank (fbank) feature extractor. It is designed to be efficient and has no external native dependencies, aiming for seamless integration across various architectures and operating systems. The library is actively maintained with frequent releases, with the current stable version being 1.22.3.
Warnings
- gotcha Kaldi's Fbank features are typically in log space and might have different scaling or representation compared to filter bank features generated by other Python speech processing libraries (e.g., `python_speech_features`). Always verify the feature specifications if integrating with models trained on features from other sources.
- gotcha While `kaldi-native-fbank` itself is advertised as 'without external dependencies' (referring to native libraries), its Python usage examples, including the official ones, often utilize `torch` for waveform generation and comparison. If you're not using `torch`, ensure your audio data is converted to a standard Python list or NumPy array before passing it to methods like `accept_waveform`.
Install
-
pip install kaldi-native-fbank
Imports
- kaldi_native_fbank
import kaldi_native_fbank as knf
Quickstart
import kaldi_native_fbank as knf
import torch
import numpy as np
# Configure Fbank options
opts = knf.FbankOptions()
opts.frame_opts.dither = 0.0
opts.mel_opts.num_bins = 80
opts.frame_opts.snip_edges = False
opts.mel_opts.debug_mel = False
sampling_rate = 16000
# Generate 10 seconds of random audio samples (simulating real audio)
# Using torch.randn for convenience, convert to list or numpy array for `accept_waveform`
samples_tensor = torch.randn(sampling_rate * 10)
samples = samples_tensor.tolist() # kaldi_native_fbank expects list or numpy array
# Initialize the online Fbank extractor
fbank_extractor = knf.OnlineFbank(opts)
# Process the waveform
fbank_extractor.accept_waveform(sampling_rate, samples)
# Retrieve the number of frames available
num_frames = fbank_extractor.num_frames_ready
print(f"Number of frames ready: {num_frames}")
# Retrieve and print the first frame
if num_frames > 0:
first_frame = fbank_extractor.get_frame(0)
print(f"Shape of the first frame: {first_frame.shape}")
print(f"First frame (first 5 values): {first_frame[:5].round(decimals=4)}")