data-designer
raw JSON → 0.5.9 verified Fri May 01 auth: no python
A general framework for synthetic data generation in Python, version 0.5.9, released under an active development cadence. Provides classes and utilities for designing and generating realistic synthetic datasets.
pip install data-designer Common errors
error ModuleNotFoundError: No module named 'data-designer' ↓
cause Attempting to import with hyphen instead of underscore.
fix
Use 'import data_designer' or 'from data_designer import ...'.
error TypeError: generate() got an unexpected keyword argument 'num_rows' ↓
cause In older versions (<0.5.0), num_rows was passed to the constructor. In 0.5.x+ it is moved to the generate method.
fix
Pass num_rows to generator.generate(num_rows=N) instead.
error AttributeError: 'DataSchema' object has no attribute 'add_column' ↓
cause Method renamed from 'add_column' to 'add_field' in version 0.5.0.
fix
Use schema.add_field(name, dtype, constraints) instead of schema.add_column(...).
Warnings
breaking In version 0.5.x, the API changed: 'add_column' was renamed to 'add_field' and the 'num_rows' argument was moved from generator instantiation to the 'generate' method. ↓
fix Update code to use 'add_field' and pass 'num_rows' to 'generate(num_rows=N)'.
gotcha The module name uses hyphens in PyPI but underscores in Python imports. Importing with a hyphen raises ModuleNotFoundError. ↓
fix Use 'import data_designer' (with underscore) instead of 'import data-designer'.
gotcha The 'DataSchema' object is mutable; modifying fields after generator instantiation does not affect the generator. Changes must be made before creating the generator. ↓
fix Define the schema completely before passing it to SyntheticDataGenerator.
Imports
- SyntheticDataGenerator wrong
from data-designer import SyntheticDataGeneratorcorrectfrom data_designer.generator import SyntheticDataGenerator - DataSchema wrong
from datadesigner.schema import DataSchemacorrectfrom data_designer.schema import DataSchema
Quickstart
from data_designer.generator import SyntheticDataGenerator
from data_designer.schema import DataSchema
schema = DataSchema()
schema.add_field(name='id', dtype='int', constraints={'unique': True})
schema.add_field(name='name', dtype='str', constraints={'min_length': 3, 'max_length': 50})
generator = SyntheticDataGenerator(schema, seed=42)
df = generator.generate(num_rows=100)
print(df.head())