WebDataset

library 1.0.2 ·python

✓ verified Jun 28, 2026

WebDataset is a high-performance Python-based I/O system for deep learning and data processing, current version 1.0.2. It implements the PyTorch IterableDataset interface, enabling efficient streaming access to datasets stored in POSIX tar archives. It supports sharding for large datasets and is compatible with PyTorch's DataLoader, facilitating scalable and latency-insensitive data pipelines for various data types including images, audio, and video. The library is actively maintained with frequent releases adding new features and bug fixes.

Traffic · last 30 days ↓80% vs prev 7d · indexed Sun Apr 12 · updated Sat Jul 11

total hits 8

actors 3 distinct systems

last hit 4d ago AhrefsBot

GPTBot

Script

top countries 🇺🇸 United States · 🇨🇦 Canada · 🇩🇪 Germany

Resources

githubgithub.com/webdataset/webdataset ↗

packagepypi.org/project/webdataset/ ↗

API endpoints

full doc /v1/registry/webdataset

install /v1/registry/webdataset/install

compatibility /v1/registry/webdataset/compatibility