Torch Audiomentations

0.12.0 · active · verified Fri Apr 10

torch-audiomentations is a PyTorch library for audio data augmentation, designed for deep learning workflows. It offers fast, GPU-compatible transforms for batches of multichannel or mono audio, extending `nn.Module` for seamless integration into neural network models. The library is currently at version 0.12.0 and receives active updates with frequent releases.

Warnings

Install

Imports

Quickstart

This example demonstrates how to apply a sequence of audio augmentations (Gain and PolarityInversion) to a batch of audio samples using `Compose`. It dynamically selects between CPU and GPU for processing.

import torch
import os
from torch_audiomentations import Compose, Gain, PolarityInversion

# Initialize augmentation callable
apply_augmentation = Compose(
    transforms=[
        Gain(min_gain_in_db=-15.0, max_gain_in_db=5.0, p=0.5),
        PolarityInversion(p=0.5)
    ]
)

torch_device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Make an example tensor with white noise.
# This tensor represents 8 audio snippets with 2 channels (stereo) and 2 seconds of 16 kHz audio.
audio_samples = torch.rand(size=(8, 2, 32000), dtype=torch.float32, device=torch_device) - 0.5

# Apply augmentation.
perturbed_audio_samples = apply_augmentation(audio_samples, sample_rate=16000)

print(f"Original audio shape: {audio_samples.shape}")
print(f"Perturbed audio shape: {perturbed_audio_samples.shape}")
print(f"Running on device: {torch_device}")

view raw JSON →