TorchLibrosa: PyTorch Implementation of Librosa

0.1.0 · active · verified Thu Apr 16

TorchLibrosa provides a PyTorch implementation of core `librosa` audio feature extraction functions, enabling GPU acceleration for tasks such as spectrogram and mel-spectrogram computation. This is particularly beneficial for deep learning pipelines that require faster feature generation on GPUs during training and evaluation. The library aims for numerical results almost identical to CPU-based `librosa` (difference less than 1e-5). The current version is 0.1.0, with an infrequent release cadence; the latest release was in February 2023.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to extract log mel spectrograms from a batch of audio signals using `torchlibrosa`'s `Spectrogram` and `LogmelFilterBank` modules, designed to work seamlessly within a PyTorch `nn.Sequential` model.

import torch
import torchlibrosa as tl

batch_size = 16
sample_rate = 22050
win_length = 2048
hop_length = 512
n_mels = 128

# Create a batch of dummy audio (e.g., for GPU processing)
batch_audio = torch.empty(batch_size, sample_rate).uniform_(-1, 1)

# Instantiate a feature extractor using Sequential for a pipeline
feature_extractor = torch.nn.Sequential(
    tl.Spectrogram(
        hop_length=hop_length,
        win_length=win_length,
    ),
    tl.LogmelFilterBank(
        sr=sample_rate,
        n_mels=n_mels,
        is_log=False, # Default is true
    )
)

# Process the audio to get log mel spectrograms
batch_feature = feature_extractor(batch_audio)

print(f"Input audio shape: {batch_audio.shape}")
print(f"Output feature shape (batch_size, 1, time_steps, mel_bins): {batch_feature.shape}")

view raw JSON →