MosaicML Streaming

library 0.13.0 ·python

✓ verified May 24, 2026

data ai-ml aws gcp azure

MosaicML Streaming (StreamingDataset) provides PyTorch-compatible datasets that can be efficiently streamed from cloud-based object stores (S3, GCS, Azure Blob Storage, Hugging Face Hub) or local filesystems. It enables training on large datasets without needing to download them entirely beforehand, improving data loading performance and reducing storage costs. The library is actively maintained with frequent updates, currently at version 0.13.0.

Traffic · last 30 days ↑123% vs prev 7d · indexed Wed Apr 15 · updated Sat May 30

total hits 45

actors 12 distinct systems

last hit 1d ago Amazonbot

ByteDance

9

ChatGPT-User

8

Amazonbot

4

MetaBot

4

GPTBot

2

Script

2

ClaudeBot

1

PerplexityBot

1

Search engines

1

Humans

1

top countries 🇺🇸 United States · 🇸🇬 Singapore · 🇩🇪 Germany · 🇫🇷 France · 🇨🇦 Canada

Resources

githubgithub.com/mosaicml/streaming ↗

packagepypi.org/project/mosaicml-streaming/ ↗

homepagestreaming.docs.mosaicml.com ↗

API endpoints

full doc /v1/registry/mosaicml-streaming

install /v1/registry/mosaicml-streaming/install

compatibility /v1/registry/mosaicml-streaming/compatibility