{"id":6616,"library":"encodec","title":"High Fidelity Neural Audio Codec","description":"EnCodec is a Python library from Facebook AI that provides a state-of-the-art deep learning based audio codec. It supports both mono 24 kHz and stereo 48 kHz audio, offering various compression rates. It leverages a streaming encoder-decoder architecture with a quantized latent space and an adversarial loss for high-fidelity audio. The current stable version is 0.1.1, with development continuing on GitHub and integration into Hugging Face Transformers.","status":"active","version":"0.1.1","language":"en","source_language":"en","source_url":"https://github.com/facebookresearch/encodec","tags":["audio","codec","neural network","deep learning","compression","machine learning"],"install":[{"cmd":"pip install -U encodec","lang":"bash","label":"Install stable release"}],"dependencies":[{"reason":"Core numerical operations.","package":"numpy","optional":false},{"reason":"Underlying deep learning framework (PyTorch 1.11.0+ recommended).","package":"torch","optional":false},{"reason":"Audio I/O and processing utilities.","package":"torchaudio","optional":false},{"reason":"Flexible tensor operations.","package":"einops","optional":false},{"reason":"Recommended for loading and managing audio datasets (when using Hugging Face Transformers integration).","package":"datasets","optional":true},{"reason":"Required for using EnCodec models via Hugging Face Transformers API.","package":"transformers","optional":true}],"imports":[{"note":"For direct use of the core library components.","symbol":"EncodecModel","correct":"from encodec import EncodecModel"},{"note":"Recommended for using pre-trained models and easy integration with Hugging Face ecosystem.","symbol":"EncodecModel (via transformers)","correct":"from transformers import EncodecModel, AutoProcessor"}],"quickstart":{"code":"import torch\nfrom datasets import load_dataset, Audio\nfrom transformers import EncodecModel, AutoProcessor\n\n# NOTE: For a real application, you would load your own audio file.\n# For quickstart, using a dummy dataset from Hugging Face.\nlibrispeech_dummy = load_dataset(\"hf-internal-testing/librispeech_asr_dummy\", \"clean\", split=\"validation\")\nsample_audio = librispeech_dummy[0][\"audio\"]['array']\nsample_rate = librispeech_dummy[0][\"audio\"]['sampling_rate']\n\n# Load pre-trained Encodec model and processor (24khz monophonic model example)\nmodel = EncodecModel.from_pretrained(\"facebook/encodec_24khz\")\nprocessor = AutoProcessor.from_pretrained(\"facebook/encodec_24khz\")\n\n# Pre-process the audio\ninputs = processor(\n    raw_audio=sample_audio,\n    sampling_rate=sample_rate,\n    return_tensors=\"pt\"\n)\n\n# Encode the audio. You can specify a bandwidth (e.g., 1.5, 3.0, 6.0, 12.0, 24.0 kbps).\n# Default is 1.5 kbps if not specified. Example: encoded_frames = model.encode(inputs[\"input_values\"], bandwidth=3.0)\nencoded_frames = model.encode(inputs[\"input_values\"])\n\n# Decode the audio\ndecoded_audio = model.decode(encoded_frames)\n\nprint(f\"Original audio shape: {inputs['input_values'].shape}\")\nprint(f\"Decoded audio shape: {decoded_audio.shape}\")\nprint(\"Audio encoded and decoded successfully!\")","lang":"python","description":"This quickstart demonstrates how to use the `encodec` library for audio compression and decompression, leveraging its integration with Hugging Face Transformers. It loads a dummy audio sample, encodes it using a pre-trained Encodec model, and then decodes it. You need to install `datasets` and `transformers` (from source) for this example to work correctly."},"warnings":[{"fix":"Manually chunk long audio files into smaller segments before processing with `encodec`.","message":"The original `encodec` library does not handle very long audio files gracefully. It processes the entire file at once, which can lead to high memory consumption and Out-of-Memory (OOM) errors. The developers have stated they do not currently support this use case.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Install `transformers` directly from GitHub: `pip install -U datasets git+https://github.com/huggingface/transformers.git@main`","message":"To use `encodec` via Hugging Face Transformers (as often recommended), the `transformers` library must be installed from its `main` GitHub branch, not the PyPI stable release, because `encodec` integration might be newer than the latest `transformers` release.","severity":"breaking","affected_versions":"All versions where Encodec is a recent addition to `transformers`."},{"fix":"Be aware of the output format and chunking when using the 48 kHz model for encoding, especially for extracting discrete representations.","message":"The 48 kHz stereo Encodec model processes audio in 1-second chunks with a 1% overlap and renormalizes the audio to unit scale. When extracting discrete representations, `model.encode(wav)` will return a list of `(codes, scale)` tuples, one for each 1-second frame. This behavior differs from the 24 kHz model.","severity":"gotcha","affected_versions":"All versions implementing the 48 kHz model."},{"fix":"Upgrade PyTorch to version 1.11.0 or newer: `pip install -U torch torchaudio`","message":"Ensure a reasonably recent version of PyTorch (ideally 1.11.0 or newer) is installed. Older PyTorch versions (e.g., <1.8) may have compatibility issues, such as different default values for `torch.stft(return_complex)` within `encodec`'s internal audio processing.","severity":"gotcha","affected_versions":"< 0.1.1 (potentially affecting older PyTorch versions)"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[]}