TorchAudio
TorchAudio is an open-source library for audio and signal processing with PyTorch, providing functions, datasets, model implementations, and application components for machine learning tasks. While it has transitioned into a maintenance phase since version 2.8/2.9 to reduce redundancies and focus on ML audio processing, it continues to release new versions (currently 2.11.0) in alignment with PyTorch releases.
Warnings
- breaking Breaking API changes: Most APIs explicitly marked as 'drop' were deprecated in TorchAudio 2.8 and subsequently removed in 2.9. This can cause `AttributeError` or `ImportError` if upgrading from older versions.
- gotcha The `torchaudio.load()` and `torchaudio.save()` functions (since 2.9) now internally rely on the `torchcodec` library. While they maintain a compatible API, some parameters like `normalize`, `buffer_size`, and `backend` are ignored. For optimal performance and full control, it is recommended to directly use `torchcodec.decoders.AudioDecoder` and `torchcodec.encoders.AudioEncoder`.
- breaking Strict PyTorch Version Compatibility: TorchAudio releases are tightly coupled with specific PyTorch versions. Using mismatched versions of `torch` and `torchaudio` will lead to runtime errors, particularly with C++ extensions.
- gotcha FFmpeg dependency for I/O: TorchAudio's audio loading and saving functionalities, particularly through the `torchcodec` backend, heavily rely on FFmpeg being installed and accessible on your system. Missing or incompatible FFmpeg versions can lead to `RuntimeError` during audio processing.
Install
-
pip install torchaudio
Imports
- torchaudio
import torchaudio
- transforms
from torchaudio import transforms
- functional
from torchaudio import functional
Quickstart
import torch
import torchaudio
from torchaudio import transforms
# Create a dummy waveform (1 channel, 16000 samples at 16kHz)
# In a real scenario, you would load an audio file: waveform, sample_rate = torchaudio.load("path/to/audio.wav")
waveform = torch.randn(1, 16000)
sample_rate = 16000
# Define a MelSpectrogram transform
melspectrogram_transform = transforms.MelSpectrogram(sample_rate=sample_rate, n_mels=128)
# Apply the transform
melspectrogram = melspectrogram_transform(waveform)
print(f"Waveform shape: {waveform.shape}")
print(f"MelSpectrogram shape: {melspectrogram.shape}")
# Expected output:
# Waveform shape: torch.Size([1, 16000])
# MelSpectrogram shape: torch.Size([1, 128, X]) where X depends on n_fft and hop_length