High Fidelity Neural Audio Codec

0.1.1 · active · verified Wed Apr 15

EnCodec is a Python library from Facebook AI that provides a state-of-the-art deep learning based audio codec. It supports both mono 24 kHz and stereo 48 kHz audio, offering various compression rates. It leverages a streaming encoder-decoder architecture with a quantized latent space and an adversarial loss for high-fidelity audio. The current stable version is 0.1.1, with development continuing on GitHub and integration into Hugging Face Transformers.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use the `encodec` library for audio compression and decompression, leveraging its integration with Hugging Face Transformers. It loads a dummy audio sample, encodes it using a pre-trained Encodec model, and then decodes it. You need to install `datasets` and `transformers` (from source) for this example to work correctly.

import torch
from datasets import load_dataset, Audio
from transformers import EncodecModel, AutoProcessor

# NOTE: For a real application, you would load your own audio file.
# For quickstart, using a dummy dataset from Hugging Face.
librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample_audio = librispeech_dummy[0]["audio"]['array']
sample_rate = librispeech_dummy[0]["audio"]['sampling_rate']

# Load pre-trained Encodec model and processor (24khz monophonic model example)
model = EncodecModel.from_pretrained("facebook/encodec_24khz")
processor = AutoProcessor.from_pretrained("facebook/encodec_24khz")

# Pre-process the audio
inputs = processor(
    raw_audio=sample_audio,
    sampling_rate=sample_rate,
    return_tensors="pt"
)

# Encode the audio. You can specify a bandwidth (e.g., 1.5, 3.0, 6.0, 12.0, 24.0 kbps).
# Default is 1.5 kbps if not specified. Example: encoded_frames = model.encode(inputs["input_values"], bandwidth=3.0)
encoded_frames = model.encode(inputs["input_values"])

# Decode the audio
decoded_audio = model.decode(encoded_frames)

print(f"Original audio shape: {inputs['input_values'].shape}")
print(f"Decoded audio shape: {decoded_audio.shape}")
print("Audio encoded and decoded successfully!")

view raw JSON →