{"id":9501,"library":"auraloss","title":"Auraloss","description":"Auraloss is a collection of audio-focused loss functions implemented in PyTorch, designed for tasks like audio synthesis, source separation, and speech enhancement. It provides specialized losses such as Mel-spectrogram, multi-resolution STFT, and perceptual losses. The current stable version is 0.4.0, and new features and improvements are added periodically, with releases typically following significant development milestones.","status":"active","version":"0.4.0","language":"en","source_language":"en","source_url":"https://github.com/csteinmetz1/auraloss","tags":["audio","loss-functions","pytorch","deep-learning","signal-processing","speech-enhancement","audio-synthesis"],"install":[{"cmd":"pip install auraloss","lang":"bash","label":"Stable release"},{"cmd":"pip install git+https://github.com/csteinmetz1/auraloss.git","lang":"bash","label":"Latest development version"}],"dependencies":[{"reason":"Core deep learning framework","package":"torch","optional":false},{"reason":"Audio processing utilities, particularly for STFT/Mel-spectrogram computations","package":"torchaudio","optional":false}],"imports":[{"symbol":"MultiResolutionSTFTLoss","correct":"from auraloss.freq import MultiResolutionSTFTLoss"},{"symbol":"MelSpectrogramLoss","correct":"from auraloss.freq import MelSpectrogramLoss"},{"symbol":"PerceptualLoss","correct":"from auraloss.perceptual import PerceptualLoss"},{"symbol":"SpectralConvergenceLoss","correct":"from auraloss.freq import SpectralConvergenceLoss"}],"quickstart":{"code":"import torch\nfrom auraloss.freq import MultiResolutionSTFTLoss\n\ndevice = 'cuda' if torch.cuda.is_available() else 'cpu'\n\n# Dummy input/target tensors (e.g., 10 seconds of mono audio at 16kHz)\n# Batch size B, Channels C, Samples S\ninput_audio = torch.randn(2, 1, 160000, device=device)\ntarget_audio = torch.randn(2, 1, 160000, device=device)\n\n# Initialize Multi-Resolution STFT Loss\n# The paper recommends a set of default parameters for MR-STFT Loss\n# consisting of 3 STFT magnitudes, with varying window sizes and hop sizes.\n# auraloss.freq.MultiResolutionSTFTLoss provides these defaults.\nmr_stft_loss = MultiResolutionSTFTLoss().to(device)\n\n# Compute the loss\nloss = mr_stft_loss(input_audio, target_audio)\n\nprint(f\"Computed MR-STFT Loss: {loss.item()}\")","lang":"python","description":"This quickstart demonstrates how to instantiate and use the MultiResolutionSTFTLoss, a common and powerful loss function in auraloss. It generates dummy audio tensors and calculates the loss between them, showcasing the basic API for most loss functions in the library. Ensure PyTorch is installed and CUDA is available for GPU acceleration."},"warnings":[{"fix":"Reshape your tensors to `(B, C, S)` using `tensor.unsqueeze(1)` for mono audio or `tensor.view(B, C, S)` for multi-channel audio where `C` is the channel dimension.","message":"All auraloss functions expect input and target tensors to be 3-dimensional (Batch, Channels, Samples). A common mistake is to pass 2D (Batch, Samples) or 1D (Samples) tensors.","severity":"gotcha","affected_versions":">=0.1.0"},{"fix":"Move all tensors and the loss module to the target device: `input_audio = input_audio.to(device)`, `target_audio = target_audio.to(device)`, `loss_fn = loss_fn.to(device)`.","message":"Ensure input and target tensors are on the same device (CPU/GPU) as the loss function instance. Mismatched devices will lead to `RuntimeError: Expected all tensors to be on the same device`.","severity":"gotcha","affected_versions":">=0.1.0"},{"fix":"Ensure your audio tensors are `torch.float32` or `torch.float64`. You can cast them using `tensor.to(dtype=torch.float32)`.","message":"The STFT-based losses (e.g., MultiResolutionSTFTLoss, MelSpectrogramLoss) rely on `torchaudio`'s STFT implementation, which might have specific requirements for tensor dtypes (typically `torch.float32` or `torch.float64`). Using other dtypes like `torch.float16` might cause issues or unexpected behavior.","severity":"gotcha","affected_versions":">=0.1.0"},{"fix":"Update your imports: `from auraloss.freq import MultiResolutionSTFTLoss` for the multi-res version, or `from auraloss.freq import MelSpectrogramLoss`. The original `STFTLoss` was superseded by `MultiResolutionSTFTLoss` for improved performance and robustness.","message":"Prior to v0.3.0, some loss functions like `STFTLoss` and `MelSTFTLoss` were directly in `auraloss.loss.STFTLoss` or `auraloss.loss.MelSTFTLoss`. They were later refactored into `auraloss.freq` and renamed.","severity":"breaking","affected_versions":"<0.3.0"}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"Ensure all tensors and the loss module are moved to the same device: `loss_fn = loss_fn.to(device)`, `input_audio = input_audio.to(device)`, `target_audio = target_audio.to(device)`.","cause":"Input audio tensors or target audio tensors are on a different device (CPU/GPU) than the initialized auraloss module.","error":"RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!"},{"fix":"For mono audio, add a channel dimension using `tensor.unsqueeze(1)`: `input_audio = input_audio.unsqueeze(1)`.","cause":"auraloss functions expect input audio to be in (Batch, Channels, Samples) format, but received a 2D tensor (e.g., Batch, Samples).","error":"ValueError: Expected input to be a 3D tensor, got 2D tensor"},{"fix":"You need to import and instantiate a specific loss class first, then call its instance: `from auraloss.freq import MultiResolutionSTFTLoss; mr_loss = MultiResolutionSTFTLoss(); loss = mr_loss(input, target)`.","cause":"Attempting to call the auraloss module directly (e.g., `auraloss.freq(input, target)`) instead of an instantiated loss class.","error":"TypeError: 'module' object is not callable"}]}