Stable Audio Tools

raw JSON →
0.0.19 verified Fri May 01 auth: no python

A Python library by Stability AI for training and inference with generative audio models, including Stable Audio and Dance Diffusion. Current version 0.0.19. Active development with frequent updates.

pip install stable-audio-tools
error ModuleNotFoundError: No module named 'stable_audio_tools'
cause Library not installed or installed in wrong environment.
fix
Run pip install stable-audio-tools in the correct Python environment.
error KeyError: 'stable-audio-open-1.0'
cause Model name does not exist or misspelled.
fix
Check available models with get_models() and use the exact key string.
error RuntimeError: Expected all tensors to be on the same device, but found at least two devices
cause Model and input tensors on different devices (CPU/GPU).
fix
Ensure model and all inputs are moved to the same device with .to(device).
gotcha The library is in early development (v0.0.19). APIs may change without notice. Always pin your dependency version.
fix Install with `pip install stable-audio-tools==0.0.19` and watch GitHub for updates.
deprecated The old import path `from stable_audio_tools.models import get_models` is deprecated in favor of `from stable_audio_tools import get_models`.
fix Use `from stable_audio_tools import get_models`.
gotcha Model names supplied to `get_pretrained_model_and_config` must match exactly the keys from `get_models()`. Case and hyphen sensitive.
fix Always inspect the list returned by `get_models()` to get exact names.

Quickstart to list models and load a pretrained model. Full text-to-audio generation requires additional steps (text encoder, diffusion loop).

import torch
import soundfile as sf
from stable_audio_tools import get_models
from stable_audio_tools.interface import get_pretrained_model_and_config

# List available models
models = get_models()
print("Available models:", list(models.keys()))

# Use a stable audio model (replace with actual model name from list)
model_name = "stable-audio-open-1.0"  # example, check get_models()
model, config = get_pretrained_model_and_config(model_name)

# Generate audio: text-to-audio (simplified, requires proper sampling setup)
# Create a random latent and decode (demo only)
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
# Note: Full generation requires T5 text encoder and diffusion loop
print("Model loaded successfully. Refer to official docs for full inference.")