SNAC

raw JSON →
1.2.1 verified Fri May 01 auth: no python

Multi-Scale Neural Audio Codec for audio compression, supporting 24 kHz, 32 kHz, and 44 kHz sampling rates. This is a PyTorch-based library for encoding audio into discrete codes (suitable for language modeling) and decoding back to waveform. Current version 1.2.1 has a stable API with `encode` and `decode` methods.

pip install snac
error ImportError: cannot import name 'SNAC' from 'snac'
cause SNAC was not installed correctly or an incompatible version is installed.
fix
Ensure you installed the correct package: pip install snac. Check that you are not shadowing the package with a local file named snac.py.
error RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!
cause The model is on one device, but the input tensor is on another.
fix
Move the model and input to the same device: model = model.to('cuda'); audio = audio.to('cuda').
error OSError: Can't load tokenizer for 'hubertsiuzdak/snac_24khz'
cause The model name is incorrect or the model is not publicly accessible.
fix
Use a valid Hugging Face model ID (e.g., 'hubertsiuzdak/snac_24khz', 'hubertsiuzdak/snac_32khz', 'hubertsiuzdak/snac_44khz') or ensure your network can access huggingface.co.
gotcha The `encode` method returns a list of tensors (one per layer) in version <1.2.0, but returns a single stacked tensor in 1.2.0+. Check your version and adjust code accordingly.
fix Upgrade to >=1.2.0 or use `codes = model.encode(audio)` and handle list.
deprecated Loading models from a local filepath was broken in 1.2.0 and fixed in 1.2.1. If you use `SNAC.from_pretrained('./local_model')`, ensure version >=1.2.1.
fix Upgrade to 1.2.1 or use a Hugging Face model ID.
gotcha The model expects audio resampled to the model's sample rate (24kHz, 32kHz, or 44kHz). Failure to resample will produce garbled output.
fix Resample input audio to match the model's sample rate before encoding.
breaking Version 1.0.0 introduced a completely new architecture and model zoo. Models from v0.x (if they existed) are incompatible.
fix Use only v1.x models and upgrade to latest version.

Load a pretrained model, encode audio to discrete codes, and decode back to audio.

import torch
from snac import SNAC

model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz")
audio = torch.randn(1, 1, 24000)  # 1 second of 24 kHz audio
codes = model.encode(audio)
print("Codes shape:", codes.shape)
reconstructed = model.decode(codes)
print("Audio shape:", reconstructed.shape)