Audiomentations

0.43.1 · active · verified Tue Apr 14

Audiomentations is a Python library for audio data augmentation, inspired by `albumentations`. It provides a fast and easy-to-use API for applying various transformations to audio data, useful for machine learning and deep learning tasks. It runs on CPU, supports both mono and multichannel audio, and integrates well into training pipelines for frameworks like TensorFlow/Keras or PyTorch. The library is actively maintained, with frequent releases, and is currently at version 0.43.1.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a composition of several common waveform-based audio augmentations and apply them to a dummy audio signal. The `Compose` object allows chaining multiple transformations, each with its own probability `p`.

import numpy as np
from audiomentations import Compose, AddGaussianNoise, TimeStretch, PitchShift, Shift

sample_rate = 16000
# Generate 2 seconds of dummy audio (mono, float32, between -0.2 and 0.2)
samples = np.random.uniform(low=-0.2, high=0.2, size=(sample_rate * 2,)).astype(np.float32)

# Define an augmentation pipeline
augment = Compose([
    AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
    TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
    PitchShift(min_semitones=-4, max_semitones=4, p=0.5),
    Shift(p=0.5),
])

# Apply augmentation
augmented_samples = augment(samples=samples, sample_rate=sample_rate)

print(f"Original samples shape: {samples.shape}")
print(f"Augmented samples shape: {augmented_samples.shape}")
print(f"Original samples dtype: {samples.dtype}")
print(f"Augmented samples dtype: {augmented_samples.dtype}")

view raw JSON →