Petastorm

library 0.13.1 ·python

✓ verified May 24, 2026

ai-ml data database

Petastorm is a Python library that enables single-node or distributed training of machine learning models directly from datasets stored in Parquet format. It provides data access for popular frameworks like TensorFlow, PyTorch, and Apache Spark. The current stable version is 0.13.1, with releases typically following a feature-driven cadence, often including release candidates before stable versions.

Traffic · last 30 days ↑25% vs prev 7d · indexed Thu Apr 16 · updated Mon Jun 01

total hits 26

actors 9 distinct systems

last hit 20h ago ByteDance

MetaBot

4

ByteDance

3

GPTBot

2

Script

2

YouBot

1

ClaudeBot

1

Search engines

1

Humans

2

top countries 🇺🇸 United States · 🇫🇷 France · 🇸🇬 Singapore · 🇩🇪 Germany · 🇨🇦 Canada

Resources

githubgithub.com/uber/petastorm ↗

packagepypi.org/project/petastorm/ ↗

API endpoints

full doc /v1/registry/petastorm

install /v1/registry/petastorm/install

compatibility /v1/registry/petastorm/compatibility