{"id":5295,"library":"lhotse","title":"Lhotse","description":"Lhotse is a Python library for data preparation in speech and audio processing. It provides a flexible, declarative API for representing audio collections as manifests (e.g., Recordings, Supervisions, Cuts) and tools for data manipulation, augmentation, and feature extraction. It's currently at version 1.32.2 and maintains an active development and release cadence.","status":"active","version":"1.32.2","language":"en","source_language":"en","source_url":"https://github.com/lhotse-speech/lhotse","tags":["audio","speech","data-preparation","manifests","speech-recognition","speech-synthesis"],"install":[{"cmd":"pip install lhotse","lang":"bash","label":"Basic installation"},{"cmd":"pip install lhotse[kaldi,torchaudio,train]","lang":"bash","label":"Full installation with common optional dependencies"}],"dependencies":[],"imports":[{"symbol":"CutSet","correct":"from lhotse import CutSet"},{"symbol":"MonoCut","correct":"from lhotse import MonoCut"},{"symbol":"Recording","correct":"from lhotse import Recording"},{"symbol":"Supervision","correct":"from lhotse import Supervision"},{"note":"Feature writers were moved to `lhotse.features.io` in Lhotse 1.0.","wrong":"from lhotse.writers import LilcomChunkyWriter","symbol":"LilcomChunkyWriter","correct":"from lhotse.features.io import LilcomChunkyWriter"}],"quickstart":{"code":"from lhotse import MonoCut, CutSet\n\n# Create a simple mono cut representing metadata for an audio segment\ncut = MonoCut(\n    id=\"example-cut-001\",\n    start=0.0,\n    duration=5.0, # 5 seconds\n    channel=0,\n    recording_id=\"example-rec-001\",\n    supervisions=[], # An empty list of supervisions\n    features=None,   # Features can be attached later\n    sampling_rate=16000,\n    num_samples=int(5.0 * 16000)\n)\n\n# Create a CutSet from a list of cuts\ncuts = CutSet([cut])\n\n# Perform a simple operation, e.g., print its duration\nprint(f\"CutSet created with {len(cuts)} cut(s).\")\nfirst_cut = cuts[0]\nprint(f\"First cut ID: {first_cut.id}\")\nprint(f\"First cut duration: {first_cut.duration} seconds\")\n\n# In a real scenario, you'd save and load manifests:\n# cuts.to_json(\"my_cuts.jsonl.gz\")\n# loaded_cuts = CutSet.from_json(\"my_cuts.jsonl.gz\")\n# print(f\"Loaded {len(loaded_cuts)} cuts from file.\")","lang":"python","description":"This quickstart demonstrates how to create a basic `MonoCut` representing an audio segment's metadata and encapsulate it within a `CutSet`. It shows how to access basic properties like ID and duration. Lhotse operates primarily on `CutSet` objects, which are collections of `Cut`s (e.g., `MonoCut`, `MixedCut`)."},"warnings":[{"fix":"Review code that directly compares or performs arithmetic with time-related attributes (e.g., `cut.duration`, `cut.start`). In most cases, implicit conversion works, but explicit conversion to `float` (e.g., `float(cut.duration)`) might be needed for strict comparisons or external libraries expecting floats.","message":"Lhotse 1.0 introduced `lhotse.utils.Duration` and `lhotse.utils.Timestamp` objects to represent time quantities, replacing direct floats. While they mostly behave like floats, direct float comparisons, arithmetic operations, or type hints in older code might break.","severity":"breaking","affected_versions":"<1.0 to 1.0+"},{"fix":"Leverage Lhotse's dataset and data loader utilities (e.g., `lhotse.dataset.K2SpeechRecognitionDataset`, `torch.utils.data.DataLoader`) which handle batching and resource management efficiently. Avoid `list()` on large `CutSet`s after transformations unless absolutely necessary, and prefer iterator-based processing.","message":"Lhotse heavily relies on lazy evaluation for performance. Materializing large `CutSet` objects (e.g., `list(cuts.map(...))`, `list(cuts)`) or repeatedly calling `Cut.load_audio()` on many cuts without proper batching can lead to Out-Of-Memory (OOM) errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Install Lhotse with the necessary optional dependencies using the extra syntax, e.g., `pip install lhotse[kaldi,torchaudio,train]` for a common set, or `pip install lhotse[all]` for most extras. Refer to the official documentation for a complete list of extras.","message":"Many common functionalities (e.g., Kaldi-style feature extraction, Torchaudio-based audio I/O, specific training integrations) require optional dependencies. Failing to install these will result in runtime `ImportError` or other errors when attempting to use the functionality.","severity":"gotcha","affected_versions":"All versions"},{"fix":"When manually loading features, use `lhotse.features.io.read_lilcom_array` or `lilcom.decompress` to uncompress the data. When using Lhotse's datasets, features are typically decompressed automatically as part of the data loading pipeline.","message":"Features in Lhotse are often stored with `lilcom` compression by default for efficiency. Directly accessing `Cut.load_features()` outside of a `lhotse.dataset` iterator will return compressed arrays. If you need to work with uncompressed features, you must explicitly decompress them.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}