Torch Audiomentations
torch-audiomentations is a PyTorch library for audio data augmentation, designed for deep learning workflows. It offers fast, GPU-compatible transforms for batches of multichannel or mono audio, extending `nn.Module` for seamless integration into neural network models. The library is currently at version 0.12.0 and receives active updates with frequent releases.
Warnings
- breaking Support for 1-dimensional and 2-dimensional audio tensors was removed. Only 3-dimensional audio tensors (batch_size, num_channels, num_samples) are supported.
- deprecated The default `torch.Tensor` output type is deprecated. An `ObjectDict` output type is available and is the recommended future-proof option. Support for `torch.Tensor` output will be removed in a future version.
- gotcha Using `torch-audiomentations` in a multiprocessing context (e.g., with PyTorch's `DataLoader` `num_workers > 0`) can lead to memory leaks.
- gotcha Multi-GPU (DDP) setups are not officially supported due to testing limitations and may not work as expected.
- breaking The `librosa` dependency was entirely removed in favor of `torchaudio`.
- breaking The minimum `torchaudio` dependency was bumped from `>=0.7.0` to `>=0.9.0`.
Install
-
pip install torch-audiomentations
Imports
- Compose
from torch_audiomentations import Compose
- Gain
from torch_audiomentations import Gain
- PolarityInversion
from torch_audiomentations import PolarityInversion
Quickstart
import torch
import os
from torch_audiomentations import Compose, Gain, PolarityInversion
# Initialize augmentation callable
apply_augmentation = Compose(
transforms=[
Gain(min_gain_in_db=-15.0, max_gain_in_db=5.0, p=0.5),
PolarityInversion(p=0.5)
]
)
torch_device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Make an example tensor with white noise.
# This tensor represents 8 audio snippets with 2 channels (stereo) and 2 seconds of 16 kHz audio.
audio_samples = torch.rand(size=(8, 2, 32000), dtype=torch.float32, device=torch_device) - 0.5
# Apply augmentation.
perturbed_audio_samples = apply_augmentation(audio_samples, sample_rate=16000)
print(f"Original audio shape: {audio_samples.shape}")
print(f"Perturbed audio shape: {perturbed_audio_samples.shape}")
print(f"Running on device: {torch_device}")