AudioLM

raw JSON →
0.0.1.dev0 verified Sat May 09 auth: no python

AudioLM is a PyTorch-based implementation of a language modeling approach to audio generation, capable of generating coherent audio continuations given a short prompt. Current version is 0.0.1.dev0, with irregular releases.

pip install audiolm
error ModuleNotFoundError: No module named 'audiolm'
cause Library not installed or installed in wrong environment.
fix
Run 'pip install audiolm' in the correct environment.
error RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu
cause Input tensor on CPU while model on GPU.
fix
Move input tensors to the same device: waveform = waveform.to('cuda') after loading model.
breaking The library is in early development (0.0.1.dev0). API is unstable and may change without notice. Do not use in production.
fix Pin to specific commit or use only for experimentation.
gotcha AudioLM requires significant GPU memory (16GB+). CPU inference is extremely slow and may run out of memory.
fix Use a GPU with at least 16GB VRAM; reduce max_new_tokens if out-of-memory.
deprecated The 'decode' function signature may change in future versions; current version uses (model, waveform, sample_rate, ...).
fix Refer to the GitHub README for the most up-to-date usage.

Load a pre-trained AudioLM model and generate a continuation from a prompt audio file.

import torch
import torchaudio
from audiolm import AudioLM
from audiolm.decode import decode

model = AudioLM()
# Load a prompt audio file
waveform, sample_rate = torchaudio.load('prompt.wav')
# Generate continuation (requires GPU or CPU)
generated = decode(model, waveform, sample_rate, max_new_tokens=256)
torchaudio.save('output.wav', generated[0].unsqueeze(0), sample_rate)