DeepEcho Synthetic Data Generator
DeepEcho is a Python library within the SDV ecosystem for generating sequential synthetic data from real-world datasets using Generative Adversarial Networks (GANs). It's designed for data that has a temporal or sequential component, such as time series or event logs. Currently at version 0.8.1, the library maintains an active development pace with frequent updates and bug fixes, often aligning with the broader SDV ecosystem's release cycle.
Common errors
-
ModuleNotFoundError: No module named 'deepecho.synthesizers'
cause Attempting to import the `DeepEchoSynthesizer` class, which was removed in DeepEcho v0.7.0.fixChange your import statement to `from deepecho.models import DeepEcho`. -
TypeError: DeepEcho.__init__() missing 1 required positional argument: 'metadata'
cause As of v0.7.0, `DeepEcho` requires a `metadata` object during initialization to understand the data structure, especially for sequential data.fixObtain a `sdv.metadata.Metadata` object for your data (e.g., using `sdv.metadata.SingleTableMetadata.load_from_dataframe()`) and pass it to the DeepEcho constructor: `model = DeepEcho(metadata=my_metadata)`. -
ValueError: The input data must contain the primary key column 'column_name' defined in the metadata.
cause The dataframe provided to `DeepEcho.fit()` is missing a column that is declared as a primary key in the `metadata` object, or the column name in the data does not match the metadata.fixVerify that your input dataframe contains all columns specified as keys in the `metadata`. Ensure column names in the dataframe exactly match those in the metadata.
Warnings
- breaking The main class `DeepEchoSynthesizer` was removed in v0.7.0. It was replaced by `DeepEcho` in `deepecho.models`.
- breaking DeepEcho models no longer handle data preprocessing internally as of v0.7.0. All preprocessing, data loading, and metadata generation must now be handled externally, primarily using the `sdv` library.
- gotcha When working with sequential data, it is critical that the `metadata` object correctly defines the primary key, parent key (if nested), and sequence key for each table. Incorrect metadata can lead to errors during fitting or generation of nonsensical synthetic data.
Install
-
pip install deepecho
Imports
- DeepEcho
from deepecho.synthesizers import DeepEchoSynthesizer
from deepecho.models import DeepEcho
Quickstart
from deepecho.models import DeepEcho from sdv.datasets.demo import get_sequential_demo # Get demo data for sequential modeling from SDV metadata, data = get_sequential_demo() # Initialize and fit the DeepEcho model # Metadata is crucial for DeepEcho to understand the sequential structure model = DeepEcho(metadata=metadata) model.fit(data) # Generate 100 rows of synthetic sequential data synthetic_data = model.sample(num_rows=100) print(synthetic_data.head())