Auraloss

0.4.0 · active · verified Fri Apr 17

Auraloss is a collection of audio-focused loss functions implemented in PyTorch, designed for tasks like audio synthesis, source separation, and speech enhancement. It provides specialized losses such as Mel-spectrogram, multi-resolution STFT, and perceptual losses. The current stable version is 0.4.0, and new features and improvements are added periodically, with releases typically following significant development milestones.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to instantiate and use the MultiResolutionSTFTLoss, a common and powerful loss function in auraloss. It generates dummy audio tensors and calculates the loss between them, showcasing the basic API for most loss functions in the library. Ensure PyTorch is installed and CUDA is available for GPU acceleration.

import torch
from auraloss.freq import MultiResolutionSTFTLoss

device = 'cuda' if torch.cuda.is_available() else 'cpu'

# Dummy input/target tensors (e.g., 10 seconds of mono audio at 16kHz)
# Batch size B, Channels C, Samples S
input_audio = torch.randn(2, 1, 160000, device=device)
target_audio = torch.randn(2, 1, 160000, device=device)

# Initialize Multi-Resolution STFT Loss
# The paper recommends a set of default parameters for MR-STFT Loss
# consisting of 3 STFT magnitudes, with varying window sizes and hop sizes.
# auraloss.freq.MultiResolutionSTFTLoss provides these defaults.
mr_stft_loss = MultiResolutionSTFTLoss().to(device)

# Compute the loss
loss = mr_stft_loss(input_audio, target_audio)

print(f"Computed MR-STFT Loss: {loss.item()}")

view raw JSON →