Lhotse

1.32.2 · active · verified Mon Apr 13

Lhotse is a Python library for data preparation in speech and audio processing. It provides a flexible, declarative API for representing audio collections as manifests (e.g., Recordings, Supervisions, Cuts) and tools for data manipulation, augmentation, and feature extraction. It's currently at version 1.32.2 and maintains an active development and release cadence.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a basic `MonoCut` representing an audio segment's metadata and encapsulate it within a `CutSet`. It shows how to access basic properties like ID and duration. Lhotse operates primarily on `CutSet` objects, which are collections of `Cut`s (e.g., `MonoCut`, `MixedCut`).

from lhotse import MonoCut, CutSet

# Create a simple mono cut representing metadata for an audio segment
cut = MonoCut(
    id="example-cut-001",
    start=0.0,
    duration=5.0, # 5 seconds
    channel=0,
    recording_id="example-rec-001",
    supervisions=[], # An empty list of supervisions
    features=None,   # Features can be attached later
    sampling_rate=16000,
    num_samples=int(5.0 * 16000)
)

# Create a CutSet from a list of cuts
cuts = CutSet([cut])

# Perform a simple operation, e.g., print its duration
print(f"CutSet created with {len(cuts)} cut(s).")
first_cut = cuts[0]
print(f"First cut ID: {first_cut.id}")
print(f"First cut duration: {first_cut.duration} seconds")

# In a real scenario, you'd save and load manifests:
# cuts.to_json("my_cuts.jsonl.gz")
# loaded_cuts = CutSet.from_json("my_cuts.jsonl.gz")
# print(f"Loaded {len(loaded_cuts)} cuts from file.")

view raw JSON →