TorchLibrosa: PyTorch Implementation of Librosa
TorchLibrosa provides a PyTorch implementation of core `librosa` audio feature extraction functions, enabling GPU acceleration for tasks such as spectrogram and mel-spectrogram computation. This is particularly beneficial for deep learning pipelines that require faster feature generation on GPUs during training and evaluation. The library aims for numerical results almost identical to CPU-based `librosa` (difference less than 1e-5). The current version is 0.1.0, with an infrequent release cadence; the latest release was in February 2023.
Common errors
-
ModuleNotFoundError: No module named 'torch'
cause `torchlibrosa` implicitly depends on `torch` but does not list it in its `install_requires`.fixInstall PyTorch separately: `pip install torch` (or the appropriate version for your CUDA setup). -
TypeError: pad_center() takes 1 positional argument but 2 were given
cause This error has been reported in older versions (e.g., 0.0.9) when initializing `STFT` or `Spectrogram`, indicating a potential mismatch in expected arguments for internal padding functions, possibly due to `librosa` or `torch` version incompatibilities.fixUpdate `torchlibrosa` to version 0.1.0 and ensure your `librosa` dependency is `>=0.9.0` as specified. If the issue persists, try updating PyTorch to a recent stable version. -
RuntimeError: mat1 and mat2 shapes cannot be multiplied
cause This is a common PyTorch error, often occurring when input tensors to `torchlibrosa` modules (which internally use matrix multiplications) do not have the expected dimensions or shape.fixCarefully review the expected input shape for the `torchlibrosa` module you are using (e.g., `Spectrogram` expects `(batch_size, samples)`). Ensure your input tensor matches these requirements, potentially using `unsqueeze` or `permute` to reshape.
Warnings
- gotcha PyTorch is a fundamental dependency for `torchlibrosa`, but it is not explicitly listed in the `install_requires` of the `setup.py`. This can lead to a `ModuleNotFoundError` if PyTorch is not installed separately.
- gotcha While `torchlibrosa` aims to provide 'almost identical features' to `librosa`, a 'numerical difference less than 1e-5' is explicitly stated. Users expecting bit-for-bit identical results to `librosa` on CPU might encounter minor discrepancies.
- breaking Older versions of `torchlibrosa` (e.g., 0.0.9) had compatibility issues with specific PyTorch versions (e.g., `torch=1.10.0+cu111`), potentially leading to runtime errors related to internal function calls or argument mismatches.
Install
-
pip install torchlibrosa
Imports
- Spectrogram
from torchlibrosa import Spectrogram
- LogmelFilterBank
from torchlibrosa import LogmelFilterBank
- STFT
from torchlibrosa import STFT
- ISTFT
from torchlibrosa import ISTFT
- torchlibrosa as tl
import torchlibrosa as tl
Quickstart
import torch
import torchlibrosa as tl
batch_size = 16
sample_rate = 22050
win_length = 2048
hop_length = 512
n_mels = 128
# Create a batch of dummy audio (e.g., for GPU processing)
batch_audio = torch.empty(batch_size, sample_rate).uniform_(-1, 1)
# Instantiate a feature extractor using Sequential for a pipeline
feature_extractor = torch.nn.Sequential(
tl.Spectrogram(
hop_length=hop_length,
win_length=win_length,
),
tl.LogmelFilterBank(
sr=sample_rate,
n_mels=n_mels,
is_log=False, # Default is true
)
)
# Process the audio to get log mel spectrograms
batch_feature = feature_extractor(batch_audio)
print(f"Input audio shape: {batch_audio.shape}")
print(f"Output feature shape (batch_size, 1, time_steps, mel_bins): {batch_feature.shape}")