Databricks Labs - PySpark Synthetic Data Generator
JSON →dbldatagen (Databricks Labs Data Generator) is an open-source Python library for generating synthetic data at scale within Apache Spark and Databricks environments. It allows users to define complex data schemas with various constraints, distributions, and inter-column relationships to create realistic datasets for testing, benchmarking, and machine learning model development. The library is currently at version 0.4.0.post1 and has an active development and release cadence.
Traffic · last 30 days ↓12% vs prev 7d
total hits 33
actors 7 distinct systems
last hit 20h ago Amazonbot
top countries 🇸🇬 Singapore · 🇺🇸 United States · 🇩🇪 Germany · 🇨🇦 Canada
Resources
API endpoints
full doc /v1/registry/dbldatagen
install /v1/registry/dbldatagen/install
compatibility /v1/registry/dbldatagen/compatibility