Resemblyzer

0.1.4 · abandoned · verified Thu Apr 16

Resemblyzer (version 0.1.4) is a Python library for extracting speaker embeddings from audio, enabling voice verification and comparison using a pre-trained deep learning model. Its last release was in December 2020, and the project appears to be no longer actively maintained.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the VoiceEncoder, preprocess an audio waveform (generating a dummy one for convenience), and extract a speaker embedding. Due to the library's age, strict PyTorch version compatibility (<=1.4.0) is critical for successful execution, especially on CUDA-enabled systems.

import numpy as np
import soundfile as sf
import os
from resemblyzer import VoiceEncoder, preprocess_wav

# Note: Resemblyzer (v0.1.4) was developed with PyTorch <= 1.4.0.
# Installing a compatible PyTorch version (e.g., pip install torch==1.4.0)
# is crucial for avoiding runtime errors, especially on GPU.

# Create a dummy WAV file for demonstration if it doesn't exist
test_wav_path = "resemblyzer_test_audio.wav"
if not os.path.exists(test_wav_path):
    # Generate a dummy 5-second 16kHz sine wave
    duration = 5  # seconds
    sample_rate = 16000 # Hz
    frequency = 440 # Hz (A4 note)
    t = np.linspace(0, duration, int(sample_rate * duration), endpoint=False)
    dummy_audio = 0.5 * np.sin(2 * np.pi * frequency * t)
    sf.write(test_wav_path, dummy_audio.astype(np.float32), sample_rate)
    print(f"Created dummy audio file: {test_wav_path}")

# Load the pre-trained encoder (downloads if not cached)
print("Loading VoiceEncoder... (first run may download model)")
encoder = VoiceEncoder.from_pretrained()

# Load and preprocess the dummy audio
wav, sr = sf.read(test_wav_path)

# Resemblyzer expects 16kHz audio. If your audio is different,
# you would need to resample it, e.g., using librosa.
# For this dummy, we ensured it's 16kHz.
if sr != 16000:
    print(f"Warning: Audio sample rate is {sr}Hz, but Resemblyzer expects 16kHz. Resampling would be needed.")
    # Example resampling (requires librosa):
    # import librosa
    # wav = librosa.resample(wav, orig_sr=sr, target_sr=16000)

clean_wav = preprocess_wav(wav)

# Encode the voice embedding
print("Encoding voice...")
embed = encoder.embed_utterance(clean_wav)

print(f"Generated embedding with shape: {embed.shape}")
print(f"First 5 elements of embedding: {embed[:5]}")

# Clean up dummy file
if os.path.exists(test_wav_path):
    os.remove(test_wav_path)
    print(f"Removed dummy audio file: {test_wav_path}")

view raw JSON →