Stable Audio Tools

0.0.19 verified Fri May 01 auth: no python

A Python library by Stability AI for training and inference with generative audio models, including Stable Audio and Dance Diffusion. Current version 0.0.19. Active development with frequent updates.

pip install stable-audio-tools

Common errors

error ModuleNotFoundError: No module named 'stable_audio_tools' ↓

cause Library not installed or installed in wrong environment.

fix

Run pip install stable-audio-tools in the correct Python environment.

error KeyError: 'stable-audio-open-1.0' ↓

cause Model name does not exist or misspelled.

fix

Check available models with get_models() and use the exact key string.

error RuntimeError: Expected all tensors to be on the same device, but found at least two devices ↓

cause Model and input tensors on different devices (CPU/GPU).

fix

Ensure model and all inputs are moved to the same device with .to(device).

Warnings

gotcha The library is in early development (v0.0.19). APIs may change without notice. Always pin your dependency version. ↓

fix Install with `pip install stable-audio-tools==0.0.19` and watch GitHub for updates.

deprecated The old import path `from stable_audio_tools.models import get_models` is deprecated in favor of `from stable_audio_tools import get_models`. ↓

fix Use `from stable_audio_tools import get_models`.

gotcha Model names supplied to `get_pretrained_model_and_config` must match exactly the keys from `get_models()`. Case and hyphen sensitive. ↓

fix Always inspect the list returned by `get_models()` to get exact names.

Imports

get_models
```
from stable_audio_tools import get_models
```
Primary function to list available pretrained models.
create_model_from_config
```
from stable_audio_tools.interface import create_model_from_config
```
Used to instantiate a model from a config or pretrained name.
ModelConfig
```
from stable_audio_tools.interface import ModelConfig
```
Config dataclass for specifying model parameters.
get_pretrained_model_and_config
```
from stable_audio_tools.interface import get_pretrained_model_and_config
```
Loads a pretrained model and its configuration.

Quickstart

Quickstart to list models and load a pretrained model. Full text-to-audio generation requires additional steps (text encoder, diffusion loop).

import torch
import soundfile as sf
from stable_audio_tools import get_models
from stable_audio_tools.interface import get_pretrained_model_and_config

# List available models
models = get_models()
print("Available models:", list(models.keys()))

# Use a stable audio model (replace with actual model name from list)
model_name = "stable-audio-open-1.0"  # example, check get_models()
model, config = get_pretrained_model_and_config(model_name)

# Generate audio: text-to-audio (simplified, requires proper sampling setup)
# Create a random latent and decode (demo only)
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
# Note: Full generation requires T5 text encoder and diffusion loop
print("Model loaded successfully. Refer to official docs for full inference.")