Python Speech Features

0.6 · active · verified Sat Apr 11

python-speech-features is a Python library designed for extracting common speech features used in Automatic Speech Recognition (ASR). It provides functionalities to compute Mel-Frequency Cepstral Coefficients (MFCCs), filterbank energies, log filterbank energies, and spectral subband centroids. The current stable version on PyPI is 0.6, last released in 2017, with a slightly newer v0.6.1 tag on its GitHub repository from 2020. The project maintains a slow release cadence, but its core functionalities remain widely used for fundamental speech feature extraction.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to generate a simple audio signal, save it as a WAV file, and then use `python-speech-features` to extract both Mel-Frequency Cepstral Coefficients (MFCCs) and log Mel-filterbank energies. It uses `scipy.io.wavfile` to handle audio file I/O.

import numpy as np
from scipy.io import wavfile
from python_speech_features import mfcc, logfbank
import os

# Create a dummy WAV file for demonstration
samplerate = 16000 # Hz
duration = 1 # seconds
f_hz = 440 # A4 note

t = np.linspace(0., duration, int(samplerate * duration))
signal = 0.5 * np.sin(2 * np.pi * f_hz * t)

# Scale to 16-bit integer for WAV file
wav_signal = (signal * 32767).astype(np.int16)
dummy_wav_filename = 'dummy_audio.wav'
wavfile.write(dummy_wav_filename, samplerate, wav_signal)

# Read the audio file
(rate, sig) = wavfile.read(dummy_wav_filename)

# Compute MFCC features
mfcc_feat = mfcc(sig, rate)
print(f"MFCC features shape: {mfcc_feat.shape}")

# Compute Log Filterbank energies
fbank_feat = logfbank(sig, rate)
print(f"Log Filterbank features shape: {fbank_feat.shape}")

# Clean up the dummy file
os.remove(dummy_wav_filename)

view raw JSON →