{"id":6010,"library":"nnaudio","title":"nnAudio","description":"nnAudio is a GPU-accelerated audio processing toolbox built on PyTorch's 1D convolutional neural networks. It specializes in generating various spectrograms (STFT, Mel, CQT) on-the-fly during deep learning training, allowing for differentiable and trainable Fourier kernels. This approach significantly speeds up spectrogram computation compared to traditional CPU-based libraries. The library is currently at version 0.3.4 and follows an active, milestone-driven release cadence.","status":"active","version":"0.3.4","language":"en","source_language":"en","source_url":"https://github.com/KinWaiCheuk/nnAudio","tags":["audio processing","deep learning","pytorch","gpu","spectrogram","neural networks","signal processing"],"install":[{"cmd":"pip install nnaudio","lang":"bash","label":"PyPI (latest stable)"},{"cmd":"pip install git+https://github.com/KinWaiCheuk/nnAudio.git#subdirectory=Installation","lang":"bash","label":"GitHub (latest development)"}],"dependencies":[{"reason":"Fundamental for array operations.","package":"numpy","optional":false},{"reason":"Used for audio file I/O and signal processing utilities.","package":"scipy","optional":false},{"reason":"Primary deep learning backend for GPU-accelerated operations.","package":"torch","optional":false},{"reason":"Functionality like mel filters are internally duplicated; not a strict dependency but often used in audio workflows.","package":"librosa","optional":true}],"imports":[{"symbol":"features","correct":"from nnAudio import features"},{"note":"`nnAudio.Spectrogram` is being replaced by `nnAudio.features` as the primary module for spectrogram classes.","wrong":"from nnAudio.Spectrogram import STFT","symbol":"STFT","correct":"from nnAudio.features import STFT"},{"symbol":"MelSpectrogram","correct":"from nnAudio.features.mel import MelSpectrogram"}],"quickstart":{"code":"import torch\nimport numpy as np\nfrom nnAudio import features\n\n# Simulate an audio waveform (e.g., from a .wav file)\nsr = 16000 # Sample rate\nduration = 1 # seconds\nt = np.linspace(0, duration, int(sr * duration), endpoint=False)\n# Simple sine wave at 440 Hz\nsong = 0.5 * np.sin(2 * np.pi * 440 * t, dtype=np.float32)\n\n# nnAudio expects a batch dimension, so unsqueeze(0)\nx = torch.tensor(song).unsqueeze(0)\n\n# Move to GPU if available, otherwise CPU\ndevice = 'cuda' if torch.cuda.is_available() else 'cpu'\nx = x.to(device)\n\n# Initialize a STFT spectrogram layer\n# Pass sample rate (sr) to the layer\nspec_layer = features.STFT(n_fft=2048, hop_length=512, sr=sr).to(device)\n\n# Feed-forward your waveform to get the spectrogram\nspectrogram = spec_layer(x)\n\nprint(f\"Input waveform shape: {x.shape}\")\nprint(f\"Output spectrogram shape: {spectrogram.shape}\")\nprint(f\"Spectrogram layer on device: {next(iter(spec_layer.parameters())).device}\")","lang":"python","description":"This quickstart demonstrates how to create a dummy audio waveform, transfer it to the appropriate device (GPU if available), initialize an STFT layer using `nnAudio.features`, and generate a spectrogram. It highlights the typical workflow of using nnAudio as a PyTorch module."},"warnings":[{"fix":"Initialize the layer without the `device` argument, then call `.to(device)`: `spec_layer = features.STFT(...).to(device)`.","message":"The `device` argument for initializing spectrogram layers (e.g., `STFT(device='cuda')`) was removed in version 0.2.0. Layers must now be moved to the desired device using the PyTorch standard `.to(device)` method after initialization.","severity":"breaking","affected_versions":">=0.2.0"},{"fix":"Update import statements from `from nnAudio.Spectrogram import ...` to `from nnAudio.features import ...` (e.g., `from nnAudio.features import STFT`).","message":"The `nnAudio.Spectrogram` module path is being replaced by `nnAudio.features`. While `nnAudio.Spectrogram` might still function, `nnAudio.features` is the recommended and future-proof import path for all spectrogram classes.","severity":"deprecated","affected_versions":">=0.3.1"},{"fix":"Ensure your PyTorch installation is `torch >= 1.6.0`.","message":"For full functionality, including the Griffin-Lim inverse transform, PyTorch version 1.6.0 or higher is required. Using older PyTorch versions might limit certain features.","severity":"gotcha","affected_versions":"<1.6.0 (PyTorch)"},{"fix":"Users can generally avoid installing `librosa` unless explicitly needed for other parts of their audio pipeline.","message":"While `librosa` is a common audio library, `nnAudio` is designed to function without it as a strict dependency. Necessary mel filter functions are included internally to prevent forced `librosa` installation issues.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-14T00:00:00.000Z","next_check":"2026-07-13T00:00:00.000Z","problems":[]}