pyannote.audio

4.0.4 · active · verified Thu Apr 09

pyannote.audio is a state-of-the-art open-source toolkit for speaker diarization. It provides pre-trained deep learning models and pipelines for tasks like speaker recognition, voice activity detection, and speaker change detection. Currently at version 4.0.4, it actively integrates with the Hugging Face Hub for model distribution and offers robust audio processing capabilities. Releases are frequent for bug fixes and minor improvements, with major versions aligning with significant API or model architecture updates.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to set up `pyannote.audio`, authenticate with the Hugging Face Hub, and run a speaker diarization pipeline on a dummy audio file. It highlights the critical step of providing an authentication token for model access, a common requirement for `pyannote.audio` models since version 4.0.

import os
import torchaudio
import torch
import numpy as np
import tempfile
import shutil

# 1. Create a dummy audio file for demonstration
duration_seconds = 5
sample_rate = 16000
t = np.linspace(0, duration_seconds, int(sample_rate * duration_seconds), endpoint=False)
audio_data = 0.5 * np.sin(2 * np.pi * 440 * t).astype(np.float32)

temp_dir = tempfile.mkdtemp()
dummy_audio_path = os.path.join(temp_dir, "dummy_audio.wav")
torchaudio.save(dummy_audio_path, torch.from_numpy(audio_data).unsqueeze(0), sample_rate)

# 2. Authenticate with Hugging Face Hub
# Get your Hugging Face token from https://huggingface.co/settings/tokens
# and set it as an environment variable `HF_TOKEN` or replace the placeholder.
hf_token = os.environ.get("HF_TOKEN", "hf_YOUR_HUGGING_FACE_TOKEN_HERE")

if hf_token == "hf_YOUR_HUGGING_FACE_TOKEN_HERE":
    print("WARNING: Please obtain a Hugging Face token from https://huggingface.co/settings/tokens ")
    print("and set the HF_TOKEN environment variable or replace the placeholder in the code.")
    print("Continuing with placeholder token; pipeline initialization might fail without proper authentication.")

# 3. Import and initialize the Pyannote.audio Pipeline
from pyannote.audio import Pipeline
pipeline = Pipeline("pyannote/speaker-diarization-3.1", use_auth_token=hf_token)

# 4. Prepare the audio input
demo_file = {"uri": "dummy_conversation", "audio": dummy_audio_path}

# 5. Run the speaker diarization
di_result = pipeline(demo_file)

# 6. Print the diarization result
print("\nDiarization Result:")
for turn, _, speaker in di_result.itertracks(yield_label=True):
    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker={speaker}")

# 7. Clean up the dummy audio file
shutil.rmtree(temp_dir)
print(f"\nCleaned up temporary audio directory: {temp_dir}")

view raw JSON →