Demucs
Demucs (Deep Extractor for Music Sources) is a state-of-the-art open-source AI model for music source separation, developed by Meta AI Research. It separates audio into individual stems like vocals, drums, bass, and other instruments in the waveform domain. The current version, 4.0.1, features a Hybrid Transformer architecture. Demucs is actively maintained and regularly updated, offering high-quality separation capabilities for various applications in music production and audio analysis.
Warnings
- breaking Demucs v4.0.1 and later require Python 3.8 or higher. Python 3.7 is no longer supported.
- breaking Major architectural changes in Demucs v4 introduce breaking changes. Users updating from v3 might need to reinstall the library from scratch due to altered dependencies and internal structures.
- gotcha Users often encounter 'CUDA out of memory' errors when processing long audio files on GPUs, especially with high sample rates or default settings.
- gotcha Demucs models (especially V4) are primarily trained on 44.1kHz or 48kHz audio. Using input audio with significantly different sample rates (e.g., 96kHz) may result in suboptimal separation quality.
- gotcha By default, Demucs automatically rescales each output stem to prevent clipping. This can alter the relative volume levels between the separated stems, which might not be desired for certain applications.
Install
-
pip install demucs -
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 && pip install demucs
Imports
- Separator
import demucs.api separator = demucs.api.Separator()
- pretrained, apply_model
from demucs import pretrained from demucs.apply import apply_model
Quickstart
import demucs.api
import os
# Create a dummy audio file for demonstration
dummy_audio_path = "test_audio.wav"
# This requires soundfile and numpy. In a real scenario, you would load an actual audio file.
try:
import numpy as np
import soundfile as sf
samplerate = 44100 # Hz
duration = 5 # seconds
frequency = 440 # Hz (A4 note)
t = np.linspace(0., duration, int(samplerate * duration), endpoint=False)
data = 0.5 * np.sin(2 * np.pi * frequency * t)
sf.write(dummy_audio_path, data, samplerate)
print(f"Created dummy audio file: {dummy_audio_path}")
# Initialize the separator. Uses the default 'htdemucs' model.
# You can specify other models: 'htdemucs_ft', 'mdx_extra', 'mdx_extra_q'
separator = demucs.api.Separator()
print(f"Separating audio file: {dummy_audio_path}")
# Separate the audio file
# The 'separated' variable will contain a dictionary where keys are stem names
# and values are the separated audio tensors (e.g., 'vocals', 'drums', 'bass', 'other')
origin, separated_stems = separator.separate_audio_file(dummy_audio_path)
# Save the separated stems
output_dir = "separated_stems"
os.makedirs(output_dir, exist_ok=True)
for stem_name, stem_audio in separated_stems.items():
output_path = os.path.join(output_dir, f"{os.path.basename(dummy_audio_path).replace('.wav', '')}_{stem_name}.wav")
demucs.api.save_audio(stem_audio, output_path, samplerate=separator.samplerate)
print(f"Saved {stem_name} to {output_path}")
finally:
# Clean up dummy file
if os.path.exists(dummy_audio_path):
os.remove(dummy_audio_path)
# Note: 'separated_stems' directory is left for inspection.
print(f"Cleaned up dummy audio file: {dummy_audio_path}")