LitData

raw JSON →
0.2.61 verified Mon Apr 27 auth: no python

A high-performance data processing library for AI workflows, part of the Lightning AI ecosystem. Provides optimized streaming datasets and data loaders for training deep learning models. Current version: 0.2.61. Active development with frequent weekly releases.

pip install litdata
error FileNotFoundError: No such file or directory
cause Input directory does not contain properly formatted chunk files or the path is incorrect.
fix
Preprocess your data using from litdata import optimize; optimize(...) to create chunks. Ensure the input_dir points to a directory with .bin and .mtx files.
error ModuleNotFoundError: No module named 'lightning'
cause Attempting to import from the old package name 'lightning' instead of 'litdata'.
fix
Use from litdata import StreamingDataset instead of from lightning.data import StreamingDataset.
breaking In v0.2.55, writing compressed data to Lightning Storage directories was fixed. Previous versions could break. Upgrade to >=0.2.55 if using compressed output.
fix pip install litdata>=0.2.55
deprecated The `LightningDataset` class may be deprecated in future versions in favor of `StreamingDataset`. Check release notes for migration.
fix Use StreamingDataset directly.
gotcha StreamingDataset expects a specific directory structure. If you pass a path without properly chunked files, it may raise FileNotFoundError or hang. Always preprocess data using `optimize` function first.
fix Use `optimize` from litdata to convert raw data into chunked format before streaming.

Example of using StreamingDataset with a dummy input directory. To use real data, replace the input_dir with a valid URI.

from litdata import StreamingDataset, StreamingDataLoader

# Create a simple streaming dataset
class MyDataset(StreamingDataset):
    def __init__(self):
        super().__init__(input_dir="s3://my-bucket/data", shuffle=True)

dataset = MyDataset()
dataloader = StreamingDataLoader(dataset, batch_size=32)
for batch in dataloader:
    print(batch)
    break