Lhotse
Lhotse is a Python library for data preparation in speech and audio processing. It provides a flexible, declarative API for representing audio collections as manifests (e.g., Recordings, Supervisions, Cuts) and tools for data manipulation, augmentation, and feature extraction. It's currently at version 1.32.2 and maintains an active development and release cadence.
Warnings
- breaking Lhotse 1.0 introduced `lhotse.utils.Duration` and `lhotse.utils.Timestamp` objects to represent time quantities, replacing direct floats. While they mostly behave like floats, direct float comparisons, arithmetic operations, or type hints in older code might break.
- gotcha Lhotse heavily relies on lazy evaluation for performance. Materializing large `CutSet` objects (e.g., `list(cuts.map(...))`, `list(cuts)`) or repeatedly calling `Cut.load_audio()` on many cuts without proper batching can lead to Out-Of-Memory (OOM) errors.
- gotcha Many common functionalities (e.g., Kaldi-style feature extraction, Torchaudio-based audio I/O, specific training integrations) require optional dependencies. Failing to install these will result in runtime `ImportError` or other errors when attempting to use the functionality.
- gotcha Features in Lhotse are often stored with `lilcom` compression by default for efficiency. Directly accessing `Cut.load_features()` outside of a `lhotse.dataset` iterator will return compressed arrays. If you need to work with uncompressed features, you must explicitly decompress them.
Install
-
pip install lhotse -
pip install lhotse[kaldi,torchaudio,train]
Imports
- CutSet
from lhotse import CutSet
- MonoCut
from lhotse import MonoCut
- Recording
from lhotse import Recording
- Supervision
from lhotse import Supervision
- LilcomChunkyWriter
from lhotse.features.io import LilcomChunkyWriter
Quickstart
from lhotse import MonoCut, CutSet
# Create a simple mono cut representing metadata for an audio segment
cut = MonoCut(
id="example-cut-001",
start=0.0,
duration=5.0, # 5 seconds
channel=0,
recording_id="example-rec-001",
supervisions=[], # An empty list of supervisions
features=None, # Features can be attached later
sampling_rate=16000,
num_samples=int(5.0 * 16000)
)
# Create a CutSet from a list of cuts
cuts = CutSet([cut])
# Perform a simple operation, e.g., print its duration
print(f"CutSet created with {len(cuts)} cut(s).")
first_cut = cuts[0]
print(f"First cut ID: {first_cut.id}")
print(f"First cut duration: {first_cut.duration} seconds")
# In a real scenario, you'd save and load manifests:
# cuts.to_json("my_cuts.jsonl.gz")
# loaded_cuts = CutSet.from_json("my_cuts.jsonl.gz")
# print(f"Loaded {len(loaded_cuts)} cuts from file.")