TorchAudio

2.11.0 · active · verified Sun Apr 05

TorchAudio is an open-source library for audio and signal processing with PyTorch, providing functions, datasets, model implementations, and application components for machine learning tasks. While it has transitioned into a maintenance phase since version 2.8/2.9 to reduce redundancies and focus on ML audio processing, it continues to release new versions (currently 2.11.0) in alignment with PyTorch releases.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to import TorchAudio, create a dummy audio waveform, and apply a common audio transformation like MelSpectrogram. In practice, `torchaudio.load` is used to load actual audio files.

import torch
import torchaudio
from torchaudio import transforms

# Create a dummy waveform (1 channel, 16000 samples at 16kHz)
# In a real scenario, you would load an audio file: waveform, sample_rate = torchaudio.load("path/to/audio.wav")
waveform = torch.randn(1, 16000)
sample_rate = 16000

# Define a MelSpectrogram transform
melspectrogram_transform = transforms.MelSpectrogram(sample_rate=sample_rate, n_mels=128)

# Apply the transform
melspectrogram = melspectrogram_transform(waveform)

print(f"Waveform shape: {waveform.shape}")
print(f"MelSpectrogram shape: {melspectrogram.shape}")
# Expected output: 
# Waveform shape: torch.Size([1, 16000])
# MelSpectrogram shape: torch.Size([1, 128, X]) where X depends on n_fft and hop_length

view raw JSON →