pyannote-pipeline
pyannote-pipeline is a component of the pyannote.audio open-source toolkit, specializing in tunable and state-of-the-art pipelines for speaker diarization. Built on the PyTorch machine learning framework, it enables tasks such as speaker segmentation, embedding, and clustering. The library is actively maintained, with its current version 4.0.0 reflecting continuous development and integration within the broader pyannote.audio ecosystem.
Warnings
- breaking `pyannote.audio` (which includes `pyannote-pipeline`) version 4.0.0 requires Python 3.10 or newer. Older Python versions are no longer supported.
- breaking The `use_auth_token` argument in `Pipeline.from_pretrained()` has been renamed to `token`.
- gotcha Accessing pretrained pipelines from Hugging Face requires accepting user conditions and providing a Hugging Face access token (e.g., via `token=os.environ.get('HUGGINGFACE_ACCESS_TOKEN')`). Failure to do so will result in authentication errors.
- gotcha The library relies on `ffmpeg` for audio decoding. `ffmpeg` is an external dependency and must be installed separately on your operating system (it is not installed via pip).
- breaking In `pyannote.audio` 4.0.0, multi-channel audio is no longer automatically downmixed to mono by default. If your workflow involves `pyannote.audio.core.io.Audio` and expects mono conversion, this behavior has changed.
- deprecated `onnxruntime` is no longer a direct dependency of `pyannote.audio`. If you are using models that rely on ONNX, you will need to install `onnxruntime` manually.
Install
-
pip install pyannote.audio
Imports
- Pipeline
from pyannote.audio import Pipeline
- ProgressHook
from pyannote.audio.pipelines.utils.hook import ProgressHook
Quickstart
import os
from pyannote.audio import Pipeline
# Ensure you have a Hugging Face access token set as an environment variable
# and have accepted user conditions for 'pyannote/speaker-diarization-community-1'
# on hf.co/pyannote/speaker-diarization-community-1
hf_token = os.environ.get('HUGGINGFACE_ACCESS_TOKEN', '')
if not hf_token:
print("Error: HUGGINGFACE_ACCESS_TOKEN environment variable not set.")
print("Please create a token at hf.co/settings/tokens and set it.")
exit()
# Instantiate a pretrained speaker diarization pipeline
try:
pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-community-1",
token=hf_token
)
except Exception as e:
print(f"Failed to load pipeline: {e}")
print("Make sure your Hugging Face token is valid and you've accepted user conditions.")
exit()
# Example: Apply the pipeline to an audio file (replace 'audio.wav' with your path)
# For demonstration, we'll simulate a file path.
# In a real scenario, you would have 'audio.wav' present.
audio_file_path = "dummy_audio.wav" # Replace with actual audio file path
# This part of the code is illustrative as 'dummy_audio.wav' won't exist.
# You would typically pass a real audio file path here.
print(f"Attempting to apply pipeline to {audio_file_path}...")
# For actual execution, ensure 'ffmpeg' is installed and 'audio.wav' exists.
# output = pipeline(audio_file_path)
# print("Diarization results:")
# for turn, speaker in output.speaker_diarization:
# print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker={speaker}")
print("Pipeline loaded successfully. To run, replace 'dummy_audio.wav' with your audio file and ensure ffmpeg is installed.")
print("Note: The actual application of the pipeline to 'dummy_audio.wav' is commented out as it requires a real audio file and ffmpeg.")