Audiomentations
Audiomentations is a Python library for audio data augmentation, inspired by `albumentations`. It provides a fast and easy-to-use API for applying various transformations to audio data, useful for machine learning and deep learning tasks. It runs on CPU, supports both mono and multichannel audio, and integrates well into training pipelines for frameworks like TensorFlow/Keras or PyTorch. The library is actively maintained, with frequent releases, and is currently at version 0.43.1.
Warnings
- breaking Version 0.43.0 increased the minimum Python version to 3.10. Additionally, `LoudnessNormalization` now uses the `loudness` library (400% faster), and `Mp3Compression` deprecated the `pydub` backend in favor of `fast-mp3-augment`.
- breaking The `TimeMask` transform underwent significant changes in version 0.41.0. The `fade` parameter was removed, new parameters like `mask_location` were added, and default values for `min_band_part` and `max_band_part` were adjusted.
- breaking In version 0.24.0, `AddBackgroundNoise` introduced new parameters (`noise_rms`, `min_absolute_rms_in_db`, `max_absolute_rms_in_db`). If you were using `AddBackgroundNoise` with positional arguments in earlier versions, this could be a breaking change.
- gotcha Audiomentations expects input audio samples to be NumPy arrays of `float32` dtype with values strictly between -1.0 and 1.0 (exclusive). Feeding other dtypes or out-of-range values can lead to unexpected behavior, clipping, or errors.
- gotcha Audiomentations is designed to run on CPU. For GPU-accelerated audio augmentation, especially within PyTorch training pipelines, consider using the `torch-audiomentations` library, which offers similar functionality optimized for GPU.
- gotcha As of v0.22.0, while most transforms support multichannel audio, `AddBackgroundNoise` and `AddShortNoises` have specific limitations or different handling for multichannel input compared to other transforms.
- deprecated In version 0.12.0, internal utility functions (e.g., `calculate_rms`) were no longer directly exposed under the top-level `audiomentations` namespace. They were moved to submodules.
Install
-
pip install audiomentations -
pip install audiomentations[extras]
Imports
- Compose
from audiomentations import Compose
- AddGaussianNoise
from audiomentations import AddGaussianNoise
- TimeStretch
from audiomentations import TimeStretch
- PitchShift
from audiomentations import PitchShift
- Shift
from audiomentations import Shift
- SpecCompose
from audiomentations import SpecCompose
Quickstart
import numpy as np
from audiomentations import Compose, AddGaussianNoise, TimeStretch, PitchShift, Shift
sample_rate = 16000
# Generate 2 seconds of dummy audio (mono, float32, between -0.2 and 0.2)
samples = np.random.uniform(low=-0.2, high=0.2, size=(sample_rate * 2,)).astype(np.float32)
# Define an augmentation pipeline
augment = Compose([
AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
PitchShift(min_semitones=-4, max_semitones=4, p=0.5),
Shift(p=0.5),
])
# Apply augmentation
augmented_samples = augment(samples=samples, sample_rate=sample_rate)
print(f"Original samples shape: {samples.shape}")
print(f"Augmented samples shape: {augmented_samples.shape}")
print(f"Original samples dtype: {samples.dtype}")
print(f"Augmented samples dtype: {augmented_samples.dtype}")