{"id":5854,"library":"audiomentations","title":"Audiomentations","description":"Audiomentations is a Python library for audio data augmentation, inspired by `albumentations`. It provides a fast and easy-to-use API for applying various transformations to audio data, useful for machine learning and deep learning tasks. It runs on CPU, supports both mono and multichannel audio, and integrates well into training pipelines for frameworks like TensorFlow/Keras or PyTorch. The library is actively maintained, with frequent releases, and is currently at version 0.43.1.","status":"active","version":"0.43.1","language":"en","source_language":"en","source_url":"https://github.com/iver56/audiomentations","tags":["audio","augmentation","machine learning","deep learning","sound"],"install":[{"cmd":"pip install audiomentations","lang":"bash","label":"Basic Install"},{"cmd":"pip install audiomentations[extras]","lang":"bash","label":"Install with all optional dependencies"}],"dependencies":[{"reason":"Required for `LoudnessNormalization` (since v0.43.0, replaces `pyloudnorm`).","package":"loudness","optional":true},{"reason":"Default backend for `Mp3Compression` (since v0.42.0, replaces `pydub`/`lameenc` for better performance).","package":"fast-mp3-augment","optional":true},{"reason":"For faster loading of 24-bit WAV files.","package":"wavio","optional":true},{"reason":"Required for `RoomSimulator` transform.","package":"pyroomacoustics","optional":true},{"reason":"Required for `Limiter` transform (since v0.42.0, replaces `cylimiter`).","package":"numpy-audio-limiter","optional":true}],"imports":[{"symbol":"Compose","correct":"from audiomentations import Compose"},{"symbol":"AddGaussianNoise","correct":"from audiomentations import AddGaussianNoise"},{"symbol":"TimeStretch","correct":"from audiomentations import TimeStretch"},{"symbol":"PitchShift","correct":"from audiomentations import PitchShift"},{"symbol":"Shift","correct":"from audiomentations import Shift"},{"note":"Used for spectrogram-based augmentations.","symbol":"SpecCompose","correct":"from audiomentations import SpecCompose"}],"quickstart":{"code":"import numpy as np\nfrom audiomentations import Compose, AddGaussianNoise, TimeStretch, PitchShift, Shift\n\nsample_rate = 16000\n# Generate 2 seconds of dummy audio (mono, float32, between -0.2 and 0.2)\nsamples = np.random.uniform(low=-0.2, high=0.2, size=(sample_rate * 2,)).astype(np.float32)\n\n# Define an augmentation pipeline\naugment = Compose([\n    AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),\n    TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),\n    PitchShift(min_semitones=-4, max_semitones=4, p=0.5),\n    Shift(p=0.5),\n])\n\n# Apply augmentation\naugmented_samples = augment(samples=samples, sample_rate=sample_rate)\n\nprint(f\"Original samples shape: {samples.shape}\")\nprint(f\"Augmented samples shape: {augmented_samples.shape}\")\nprint(f\"Original samples dtype: {samples.dtype}\")\nprint(f\"Augmented samples dtype: {augmented_samples.dtype}\")","lang":"python","description":"This quickstart demonstrates how to create a composition of several common waveform-based audio augmentations and apply them to a dummy audio signal. The `Compose` object allows chaining multiple transformations, each with its own probability `p`."},"warnings":[{"fix":"Upgrade your Python environment to 3.10 or newer. If using `LoudnessNormalization` or `Mp3Compression`, install the new optional dependencies (`loudness` and `fast-mp3-augment`) and update any code relying on specific backend implementations or `pydub`.","message":"Version 0.43.0 increased the minimum Python version to 3.10. Additionally, `LoudnessNormalization` now uses the `loudness` library (400% faster), and `Mp3Compression` deprecated the `pydub` backend in favor of `fast-mp3-augment`.","severity":"breaking","affected_versions":">=0.43.0"},{"fix":"Review and update usage of `TimeMask` to reflect the removed `fade` parameter, new `mask_location` parameter, and changed default values.","message":"The `TimeMask` transform underwent significant changes in version 0.41.0. The `fade` parameter was removed, new parameters like `mask_location` were added, and default values for `min_band_part` and `max_band_part` were adjusted.","severity":"breaking","affected_versions":">=0.41.0"},{"fix":"Always use keyword arguments when instantiating `AddBackgroundNoise` to ensure future compatibility, especially when updating across minor versions.","message":"In version 0.24.0, `AddBackgroundNoise` introduced new parameters (`noise_rms`, `min_absolute_rms_in_db`, `max_absolute_rms_in_db`). If you were using `AddBackgroundNoise` with positional arguments in earlier versions, this could be a breaking change.","severity":"breaking","affected_versions":">=0.24.0"},{"fix":"Ensure your audio data is preprocessed to `np.float32` and normalized to the range `[-1.0, 1.0]` before passing it to `audiomentations` transforms. Use transforms like `Normalize` or `Clip` if needed.","message":"Audiomentations expects input audio samples to be NumPy arrays of `float32` dtype with values strictly between -1.0 and 1.0 (exclusive). Feeding other dtypes or out-of-range values can lead to unexpected behavior, clipping, or errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For GPU acceleration, install `torch-audiomentations` (`pip install torch-audiomentations`) and adapt your code to use its PyTorch-compatible transforms.","message":"Audiomentations is designed to run on CPU. For GPU-accelerated audio augmentation, especially within PyTorch training pipelines, consider using the `torch-audiomentations` library, which offers similar functionality optimized for GPU.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Consult the official documentation for specific guidance on using `AddBackgroundNoise` and `AddShortNoises` with multichannel audio, or test thoroughly with your specific multichannel data.","message":"As of v0.22.0, while most transforms support multichannel audio, `AddBackgroundNoise` and `AddShortNoises` have specific limitations or different handling for multichannel input compared to other transforms.","severity":"gotcha","affected_versions":">=0.22.0"},{"fix":"If you were importing internal utility functions, update their import paths, e.g., `from audiomentations.core.utils import calculate_rms` instead of `from audiomentations import calculate_rms`.","message":"In version 0.12.0, internal utility functions (e.g., `calculate_rms`) were no longer directly exposed under the top-level `audiomentations` namespace. They were moved to submodules.","severity":"deprecated","affected_versions":">=0.12.0"}],"env_vars":null,"last_verified":"2026-04-14T00:00:00.000Z","next_check":"2026-07-13T00:00:00.000Z"}