{"id":4224,"library":"python-speech-features","title":"Python Speech Features","description":"python-speech-features is a Python library designed for extracting common speech features used in Automatic Speech Recognition (ASR). It provides functionalities to compute Mel-Frequency Cepstral Coefficients (MFCCs), filterbank energies, log filterbank energies, and spectral subband centroids. The current stable version on PyPI is 0.6, last released in 2017, with a slightly newer v0.6.1 tag on its GitHub repository from 2020. The project maintains a slow release cadence, but its core functionalities remain widely used for fundamental speech feature extraction.","status":"active","version":"0.6","language":"en","source_language":"en","source_url":"https://github.com/jameslyons/python_speech_features","tags":["speech processing","feature extraction","MFCC","audio","ASR"],"install":[{"cmd":"pip install python-speech-features","lang":"bash","label":"Install from PyPI"}],"dependencies":[{"reason":"Required for numerical operations and array manipulation of audio signals and features.","package":"numpy"},{"reason":"Required for scientific computing, particularly for I/O operations like reading WAV files (scipy.io.wavfile).","package":"scipy"}],"imports":[{"symbol":"mfcc","correct":"from python_speech_features import mfcc"},{"note":"Used to compute Mel-filterbank energy features.","symbol":"fbank","correct":"from python_speech_features import fbank"},{"note":"Used to compute log Mel-filterbank energy features.","symbol":"logfbank","correct":"from python_speech_features import logfbank"},{"note":"Used to compute Spectral Subband Centroid features.","symbol":"ssc","correct":"from python_speech_features import ssc"}],"quickstart":{"code":"import numpy as np\nfrom scipy.io import wavfile\nfrom python_speech_features import mfcc, logfbank\nimport os\n\n# Create a dummy WAV file for demonstration\nsamplerate = 16000 # Hz\nduration = 1 # seconds\nf_hz = 440 # A4 note\n\nt = np.linspace(0., duration, int(samplerate * duration))\nsignal = 0.5 * np.sin(2 * np.pi * f_hz * t)\n\n# Scale to 16-bit integer for WAV file\nwav_signal = (signal * 32767).astype(np.int16)\ndummy_wav_filename = 'dummy_audio.wav'\nwavfile.write(dummy_wav_filename, samplerate, wav_signal)\n\n# Read the audio file\n(rate, sig) = wavfile.read(dummy_wav_filename)\n\n# Compute MFCC features\nmfcc_feat = mfcc(sig, rate)\nprint(f\"MFCC features shape: {mfcc_feat.shape}\")\n\n# Compute Log Filterbank energies\nfbank_feat = logfbank(sig, rate)\nprint(f\"Log Filterbank features shape: {fbank_feat.shape}\")\n\n# Clean up the dummy file\nos.remove(dummy_wav_filename)\n","lang":"python","description":"This quickstart demonstrates how to generate a simple audio signal, save it as a WAV file, and then use `python-speech-features` to extract both Mel-Frequency Cepstral Coefficients (MFCCs) and log Mel-filterbank energies. It uses `scipy.io.wavfile` to handle audio file I/O."},"warnings":[{"fix":"For the very latest version, consider installing directly from the GitHub repository: `pip install git+https://github.com/jameslyons/python_speech_features.git`.","message":"The PyPI version (0.6, last updated Aug 2017) is older than the latest tag on GitHub (v0.6.1, Jan 2020). Users installing via `pip install python-speech-features` might not get the absolute latest code, which could have minor fixes or changes not yet reflected on PyPI.","severity":"gotcha","affected_versions":"<=0.6 on PyPI"},{"fix":"Explicitly convert signal data types (e.g., `signal.astype(np.float32)`) if required by other libraries. Consult the documentation of both `python-speech-features` and any other library being used for parameter details and expected outputs. Be mindful that `numcep` is effectively capped by `nfilt` in `python-speech-features`.","message":"When integrating with other audio processing libraries like `librosa`, be aware of data type expectations. `scipy.io.wavfile.read` typically returns `int16` samples, while some libraries might expect `float32`. Additionally, MFCC computation methodologies can differ between libraries (e.g., `python-speech-features` uses Discrete Fourier Transform, `librosa` uses Short-Time Fourier Transform), leading to different output shapes or values for seemingly identical parameters.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure that `nfft` is greater than or equal to `winlen * samplerate`. For example, if `winlen=0.025` and `samplerate=16000`, the frame length is 400 samples. `nfft` should be set to 512, 1024, or a higher power of 2 that accommodates this length.","message":"A common warning, `WARNING:root:frame length (X) is greater than FFT size`, can occur if the product of `winlen` (window length in seconds) and `samplerate` (frame length in samples) exceeds `nfft` (FFT size).","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}