audiocraft - Audio Generation
Audiocraft is a research library from Facebook AI for state-of-the-art audio generation, including models like MusicGen and AudioGen. It is built on PyTorch, providing tools for both model inference and training. Currently at version 1.3.0, it sees active development with new releases roughly every 1-3 months, often coinciding with new model research.
Common errors
-
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate X GiB (GPU 0; Y GiB total capacity; Z GiB already allocated; W GiB free; P MiB reserved in total by PyTorch)
cause Attempting to generate audio that exceeds the available VRAM on your GPU.fixReduce the `duration` parameter in `model.set_generation_params()`. Use a smaller model (e.g., 'small'). If possible, upgrade to a GPU with more VRAM. Ensure no other processes are consuming GPU memory. -
AttributeError: 'MusicGen' object has no attribute 'generate_unconditional'
cause You are using an older API call (`generate_unconditional`) that has been deprecated/removed in favor of a unified `generate` method.fixReplace `model.generate_unconditional()` with `model.generate(descriptions=None)`. The `generate` method now handles both conditional and unconditional generation. -
ImportError: cannot import name 'audio_write' from 'audiocraft.utils.audio' (...)
cause The `audio_write` utility function's import path changed in a recent version.fixUpdate your import statement from `from audiocraft.utils.audio import audio_write` to `from audiocraft.data.audio import audio_write`. -
RuntimeError: No module named '_C' or ImportError: cannot import name '_C' from 'torch_audiomentations.utils'
cause This often indicates a mismatch between your PyTorch version and your CUDA toolkit, or an issue with custom C++ extensions failing to compile/load.fixEnsure your PyTorch installation is compatible with your CUDA version. It's often best to install PyTorch directly from their website with the correct CUDA version. Reinstalling `audiocraft` or `torchaudio` after verifying PyTorch/CUDA can help. For `torch_audiomentations`, ensure it's properly installed and compatible with your Python/PyTorch versions.
Warnings
- gotcha Generating audio, especially longer clips or with larger models ('medium', 'large'), is extremely GPU memory intensive. Running out of VRAM is the most common issue.
- breaking The `model.generate_unconditional()` method has been removed/renamed.
- gotcha Initial model loading requires downloading large weights (~2GB-3GB+). This can fail due to network issues or insufficient disk space.
- gotcha On Linux, you might encounter `FileNotFoundError: libFLAC.so.8` or similar errors if audio processing dependencies are missing.
Install
-
pip install audiocraft -
pip install -e '.[dev]' --no-deps
Imports
- MusicGen
from audiocraft.models.musicgen import MusicGen
from audiocraft.models import MusicGen
- AudioGen
from audiocraft.models.audiogen import AudioGen
from audiocraft.models import AudioGen
- audio_write
from audiocraft.utils.audio import audio_write
from audiocraft.data.audio import audio_write
Quickstart
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write
import torch # For moving to CPU if needed
# Load a pretrained MusicGen model ('small' is generally recommended for quick tests)
# This will download model weights (~2GB for 'small'). Ensure stable internet and disk space.
# Specify device if needed: model = MusicGen.get_pretrained('small', device='cuda')
# Ensure you have a compatible PyTorch/CUDA setup for GPU usage.
model = MusicGen.get_pretrained('small')
model.set_generation_params(duration=8) # Generate 8 seconds of audio
# Define a description for the music
description = "a retro synthwave track with a driving beat"
print(f"Generating audio for: '{description}'...")
# The generate method takes a list of descriptions. For unconditional generation, pass descriptions=None.
samples = model.generate(descriptions=[description], progress=True)
# Save the generated audio to a WAV file
# `samples` is a torch.Tensor. It's good practice to move to CPU before saving if it's on GPU.
audio_write(
'my_synthwave_track',
samples[0].cpu(), # Take the first generated sample and move to CPU
model.sample_rate,
strategy="loudness",
loudness_compressor=True # Recommended for better audio quality
)
print("Audio saved as 'my_synthwave_track.wav'")