nnAudio

0.3.4 · active · verified Tue Apr 14

nnAudio is a GPU-accelerated audio processing toolbox built on PyTorch's 1D convolutional neural networks. It specializes in generating various spectrograms (STFT, Mel, CQT) on-the-fly during deep learning training, allowing for differentiable and trainable Fourier kernels. This approach significantly speeds up spectrogram computation compared to traditional CPU-based libraries. The library is currently at version 0.3.4 and follows an active, milestone-driven release cadence.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a dummy audio waveform, transfer it to the appropriate device (GPU if available), initialize an STFT layer using `nnAudio.features`, and generate a spectrogram. It highlights the typical workflow of using nnAudio as a PyTorch module.

import torch
import numpy as np
from nnAudio import features

# Simulate an audio waveform (e.g., from a .wav file)
sr = 16000 # Sample rate
duration = 1 # seconds
t = np.linspace(0, duration, int(sr * duration), endpoint=False)
# Simple sine wave at 440 Hz
song = 0.5 * np.sin(2 * np.pi * 440 * t, dtype=np.float32)

# nnAudio expects a batch dimension, so unsqueeze(0)
x = torch.tensor(song).unsqueeze(0)

# Move to GPU if available, otherwise CPU
device = 'cuda' if torch.cuda.is_available() else 'cpu'
x = x.to(device)

# Initialize a STFT spectrogram layer
# Pass sample rate (sr) to the layer
spec_layer = features.STFT(n_fft=2048, hop_length=512, sr=sr).to(device)

# Feed-forward your waveform to get the spectrogram
spectrogram = spec_layer(x)

print(f"Input waveform shape: {x.shape}")
print(f"Output spectrogram shape: {spectrogram.shape}")
print(f"Spectrogram layer on device: {next(iter(spec_layer.parameters())).device}")

view raw JSON →