AudioLM

0.0.1.dev0 verified Sat May 09 auth: no python

AudioLM is a PyTorch-based implementation of a language modeling approach to audio generation, capable of generating coherent audio continuations given a short prompt. Current version is 0.0.1.dev0, with irregular releases.

pip install audiolm

Common errors

error ModuleNotFoundError: No module named 'audiolm' ↓

cause Library not installed or installed in wrong environment.

fix

Run 'pip install audiolm' in the correct environment.

error RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu ↓

cause Input tensor on CPU while model on GPU.

fix

Move input tensors to the same device: waveform = waveform.to('cuda') after loading model.

Warnings

breaking The library is in early development (0.0.1.dev0). API is unstable and may change without notice. Do not use in production. ↓

fix Pin to specific commit or use only for experimentation.

gotcha AudioLM requires significant GPU memory (16GB+). CPU inference is extremely slow and may run out of memory. ↓

fix Use a GPU with at least 16GB VRAM; reduce max_new_tokens if out-of-memory.

deprecated The 'decode' function signature may change in future versions; current version uses (model, waveform, sample_rate, ...). ↓

fix Refer to the GitHub README for the most up-to-date usage.

Imports

AudioLM
```
from audiolm import AudioLM
```
Direct import from package top-level

train

wrong

from audiolm import train

correct

from audiolm.train import train

train is in submodule audiolm.train

decode

wrong

from audiolm import decode

correct

from audiolm.decode import decode

decode is in submodule audiolm.decode

Quickstart

Load a pre-trained AudioLM model and generate a continuation from a prompt audio file.

import torch
import torchaudio
from audiolm import AudioLM
from audiolm.decode import decode

model = AudioLM()
# Load a prompt audio file
waveform, sample_rate = torchaudio.load('prompt.wav')
# Generate continuation (requires GPU or CPU)
generated = decode(model, waveform, sample_rate, max_new_tokens=256)
torchaudio.save('output.wav', generated[0].unsqueeze(0), sample_rate)