AudioLM
raw JSON → 0.0.1.dev0 verified Sat May 09 auth: no python
AudioLM is a PyTorch-based implementation of a language modeling approach to audio generation, capable of generating coherent audio continuations given a short prompt. Current version is 0.0.1.dev0, with irregular releases.
pip install audiolm Common errors
error ModuleNotFoundError: No module named 'audiolm' ↓
cause Library not installed or installed in wrong environment.
fix
Run 'pip install audiolm' in the correct environment.
error RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu ↓
cause Input tensor on CPU while model on GPU.
fix
Move input tensors to the same device: waveform = waveform.to('cuda') after loading model.
Warnings
breaking The library is in early development (0.0.1.dev0). API is unstable and may change without notice. Do not use in production. ↓
fix Pin to specific commit or use only for experimentation.
gotcha AudioLM requires significant GPU memory (16GB+). CPU inference is extremely slow and may run out of memory. ↓
fix Use a GPU with at least 16GB VRAM; reduce max_new_tokens if out-of-memory.
deprecated The 'decode' function signature may change in future versions; current version uses (model, waveform, sample_rate, ...). ↓
fix Refer to the GitHub README for the most up-to-date usage.
Imports
- AudioLM
from audiolm import AudioLM - train wrong
from audiolm import traincorrectfrom audiolm.train import train - decode wrong
from audiolm import decodecorrectfrom audiolm.decode import decode
Quickstart
import torch
import torchaudio
from audiolm import AudioLM
from audiolm.decode import decode
model = AudioLM()
# Load a prompt audio file
waveform, sample_rate = torchaudio.load('prompt.wav')
# Generate continuation (requires GPU or CPU)
generated = decode(model, waveform, sample_rate, max_new_tokens=256)
torchaudio.save('output.wav', generated[0].unsqueeze(0), sample_rate)