Asteroid Filterbanks
Asteroid's filterbanks is a Python library providing various filterbank implementations for audio signal processing within deep learning contexts, primarily using PyTorch. It is designed to be a modular toolkit for researchers working on audio source separation. The library is actively maintained with frequent updates, particularly concerning PyTorch compatibility, with the current version being 0.4.0.
Warnings
- breaking Version 0.4.0 drops support for PyTorch versions older than 1.8.0. Ensure your PyTorch installation is `torch>=1.8.0`.
- breaking Beginning with version 0.3.1, the library moved from `torch.rfft` (deprecated and removed in newer PyTorch versions) to `torch.fft`. This change affects the output format, which now consistently returns complex numbers, potentially breaking models trained with older versions or assumptions.
- gotcha The `Encoder` and `Decoder` classes are wrappers designed to augment `Filterbank` instances with encoding/decoding methods. They are not intended to be subclassed directly for implementing new filterbanks; instead, subclass `Filterbank` and then wrap it.
- gotcha When using `asteroid_filterbanks.enc_dec.make_enc_dec` or manually setting up `Encoder` and `Decoder` for reconstruction, pay close attention to the `is_pinv` and `who_is_pinv` parameters if you intend one to be the pseudo-inverse of the other.
- gotcha If performing a development installation (`pip install -e .`) or installing `asteroid` (which includes `asteroid-filterbanks`), you might need to restart your Python runtime (especially in notebooks like Jupyter or Colab) for `asteroid_filterbanks` modules to import correctly.
- gotcha The `TorchSTFTFB` filterbank aims to replicate `torch.stft` behavior for ONNX compatibility, but it may not cover all options and edge cases of `torch.stft`. Be cautious when expecting perfect equivalence for all configurations.
Install
-
pip install asteroid-filterbanks
Imports
- Filterbank, Encoder, Decoder
from asteroid_filterbanks.enc_dec import Filterbank, Encoder, Decoder
- FreeFB, STFTFB, ParamSincFB, MelGram
from asteroid_filterbanks import FreeFB, STFTFB, ParamSincFB, MelGram
- PCEN
from asteroid_filterbanks.pcen import PCEN
- make_enc_dec
from asteroid_filterbanks.enc_dec import make_enc_dec
Quickstart
import torch
from asteroid_filterbanks.enc_dec import Encoder, Decoder
from asteroid_filterbanks import FreeFB
# Define parameters for the filterbank
n_filters = 256
kernel_size = 128
stride = 64
sample_rate = 16000
# 1. Instantiate a filterbank (e.g., a fully learnable Free Filterbank)
fb = FreeFB(n_filters=n_filters, kernel_size=kernel_size, stride=stride, sample_rate=sample_rate)
# 2. Wrap it with an Encoder to get the time-frequency representation
encoder = Encoder(fb)
# 3. Create a dummy waveform (batch, channels, time)
waveform = torch.randn(1, 1, sample_rate * 4) # 4 seconds of audio
# 4. Encode the waveform into a spectrogram-like representation
spec_like = encoder(waveform)
print(f"Input waveform shape: {waveform.shape}")
print(f"Encoded spectrogram-like shape: {spec_like.shape}")
# 5. Instantiate a Decoder (can use the same filterbank or a different one)
decoder = Decoder(fb) # For reconstruction, often use the same filterbank
# 6. Decode the spectrogram-like representation back to waveform
out_waveform = decoder(spec_like)
print(f"Decoded waveform shape: {out_waveform.shape}")