Descript Audio Codec

1.0.0 · active · verified Wed Apr 15

Descript Audio Codec (DAC) is a high-fidelity, general neural audio codec, currently at version 1.0.0. It enables compression of audio (supporting 16kHz, 24kHz, and 44.1kHz sampling rates) into discrete codes at very low bitrates, achieving approximately 90x compression for 44.1 kHz audio at 8 kbps while maintaining exceptional fidelity. It is designed to be universal, working across various audio domains including speech, music, and environmental sounds, and can serve as a drop-in replacement for codecs like EnCodec in audio language modeling applications. The library maintains an active development pace with regular updates.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the Descript Audio Codec, generate a dummy audio signal using `audiotools`, and then encode and decode it using the model. Model weights are automatically managed and downloaded on the first run. The output includes the shape of the original audio, the compressed discrete codes, and the reconstructed audio.

import dac
import torch
from audiotools import AudioSignal
import os

# Instantiate the DAC model. Model weights are automatically downloaded and cached.
# You can specify model_type='16khz', '24khz', or '44khz'. Default is '44khz'.
device = "cuda" if torch.cuda.is_available() else "cpu"
model = dac.DAC.load(model_type="44khz").to(device)

# Prepare dummy audio for encoding (e.g., 5 seconds of stereo 44.1kHz noise)
sample_rate = model.sample_rate # 44100
duration = 5 # seconds
channels = 2 # stereo
audio_tensor = torch.randn(channels, sample_rate * duration, device=device)
audio_signal = AudioSignal(audio_tensor, sample_rate)

# Encode the audio to discrete codes and get the reconstructed audio
with torch.no_grad():
    # codes: quantized latent representations (discrete tokens)
    # latents: continuous latent representations
    # audio_out: reconstructed audio (audiotools.AudioSignal)
    codes, latents, audio_out = model.encode(audio_signal, return_audio=True)

print(f"Original audio shape: {audio_signal.shape}")
print(f"Encoded codes shape (batch, num_quantizers, sequence_length): {codes.shape}")
print(f"Decoded audio shape: {audio_out.shape}")

# In a real application, you might save `codes` for storage/transmission
# or `audio_out` to a file.
# Example (requires audiotools): 
# audio_out.write("reconstructed_audio.wav")
# Example: torch.save(codes, "compressed_audio_codes.pt")

view raw JSON →