{"id":2209,"library":"pyannote-database","title":"pyannote.database","description":"pyannote.database is an open-source Python library that provides a common interface for reproducible experimental protocols across various multimedia databases (audio, video, text). It is part of the broader pyannote ecosystem. Currently at version 6.1.1, the library maintains an active development pace with several releases annually, including significant major version updates that introduce breaking changes.","status":"active","version":"6.1.1","language":"en","source_language":"en","source_url":"https://github.com/pyannote/pyannote-database","tags":["audio processing","multimedia","database","speech diarization","research","protocol"],"install":[{"cmd":"pip install pyannote-database","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Data manipulation for loaders","package":"pandas","optional":false},{"reason":"Provides core data structures for temporal segments and annotations","package":"pyannote-core","optional":false},{"reason":"For parsing YAML configuration files","package":"pyyaml","optional":false}],"imports":[{"note":"The primary interface for loading database configurations and accessing protocols since v5.0.0.","symbol":"registry","correct":"from pyannote.database import registry"},{"note":"A common preprocessor used to automatically locate media files associated with a URI within a protocol.","symbol":"FileFinder","correct":"from pyannote.database import FileFinder"},{"note":"Represents a multimedia resource with metadata, typically encountered when iterating over a protocol subset.","symbol":"ProtocolFile","correct":"from pyannote.database import ProtocolFile"},{"note":"The `get_protocol` function was removed in v5.0.0 and replaced by the `registry.get_protocol` method.","wrong":"from pyannote.database import get_protocol; protocol = get_protocol('MyDatabase.MyProtocol')","symbol":"get_protocol","correct":"protocol = registry.get_protocol('MyDatabase.MyProtocol')"}],"quickstart":{"code":"import os\nfrom pathlib import Path\nfrom pyannote.database import registry, FileFinder\n\n# 1. Define your database protocol in a YAML file (e.g., database.yml)\n# This example creates dummy files for demonstration purposes.\n(Path(\"data\") / \"train.lst\").parent.mkdir(parents=True, exist_ok=True)\n(Path(\"data\") / \"train.lst\").write_text(\"dummy_file_1\\ndummy_file_2\\n\")\n\nconfig_content = \"\"\"\nProtocols:\n  MyDatabase:\n    MyProtocol:\n      train:\n        uri: data/train.lst\n\"\"\"\nPath(\"database.yml\").write_text(config_content)\n\n# 2. Load the database configuration into the registry\n# In a real scenario, you'd provide the actual path to your database.yml\n# For demonstration, we ensure our dummy config is found.\nos.environ[\"PYANNOTE_DATABASE_CONFIG\"] = str(Path(\"database.yml\").resolve())\nregistry.load_database(\"database.yml\")\n\n# 3. Access a specific protocol\n# Use FileFinder as a preprocessor to resolve file paths for 'audio' or similar keys.\npreprocessors = {'audio': FileFinder()}\nprotocol = registry.get_protocol(\"MyDatabase.MyProtocol\", preprocessors=preprocessors)\n\n# 4. Iterate over a subset (e.g., 'train') of the protocol\nprint(\"Iterating over training files:\")\nfor current_file in protocol.train():\n    print(f\"  URI: {current_file['uri']}\")\n    # In a real application, current_file['audio'] would resolve to the media file path,\n    # and current_file['annotation'] would provide temporal annotations.\n    # For this dummy example, 'audio' will not resolve to a real file path\n    # unless actual dummy audio files are created.\n\n# 5. Clean up dummy files (not needed in a real application)\nPath(\"database.yml\").unlink()\n(Path(\"data\") / \"train.lst\").unlink()\nPath(\"data\").rmdir()","lang":"python","description":"This quickstart demonstrates how to define a simple database protocol using a YAML configuration file, load it into the `pyannote.database` registry, and then iterate over a protocol subset to access multimedia resources and their associated metadata. It highlights the use of `registry.load_database` and `FileFinder`."},"warnings":[{"fix":"Upgrade your Python environment to 3.10 or later.","message":"Version 6.0.0 and above require Python 3.10 or newer. Older Python versions are no longer supported.","severity":"breaking","affected_versions":">=6.0.0"},{"fix":"Replace calls like `get_protocol('MyProtocol')` with `registry.get_protocol('MyProtocol')`. The `registry` object must be imported from `pyannote.database`.","message":"The global functions `pyannote.database.get_database`, `get_protocol`, and `get_protocols` were removed in favor of methods on the `registry` object.","severity":"breaking","affected_versions":">=5.0.0"},{"fix":"Ensure your configuration files are loaded explicitly using `from pyannote.database import registry; registry.load_database('/path/to/your/database.yml')`.","message":"Automatic loading of `database.yml` and `~/.pyannote/database.yml` is primarily for backward compatibility with the 4.x branch. Since v5.0.0, configuration files *must* be explicitly loaded using `registry.load_database()` for robust behavior.","severity":"breaking","affected_versions":">=5.0.0"},{"fix":"Pass a `preprocessors` dictionary when calling `registry.get_protocol()`, e.g., `preprocessors={'audio': FileFinder()}`. Make sure to import `FileFinder`.","message":"When iterating over protocols (e.g., `protocol.train()`), accessing keys like `current_file['audio']` requires a preprocessor like `FileFinder` to resolve the URI to an actual file path. Without it, the key might not resolve correctly or raise an error.","severity":"gotcha","affected_versions":"all"},{"fix":"Define the `pyannote.database.loader` entry-point in your `setup.py` or `pyproject.toml` and ensure your package is installed in editable mode (`pip install -e .`) or globally for the entry-point to be active.","message":"For custom data loaders to be automatically discovered and used by `pyannote.database`, they must be registered via the `pyannote.database.loader` entry-point in your Python package's `setup.py` or `pyproject.toml`.","severity":"gotcha","affected_versions":"all"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}