DECAF Synthetic Data
DECAF (DEbiasing CAusal Fairness) is a Python library providing tools for generating synthetic data and debiasing causal effects. It implements methods to create synthetic datasets that capture complex causal relationships while mitigating various forms of bias, enabling researchers and practitioners to evaluate and develop fair causal inference models. Currently at version 0.1.7, the library is under active development with a focus on research-driven advancements.
Common errors
-
ModuleNotFoundError: No module named 'decaf'
cause The library package name on PyPI is `decaf-synthetic-data`, but the primary import is `decaf`.fixEnsure you have installed the correct package: `pip install decaf-synthetic-data`. -
ValueError: operands could not be broadcast together with shapes (X,) (Y,)
cause Input arrays (X, A, Y) passed to the `DECAF` model have incompatible shapes, often due to incorrect reshaping or concatenation.fixVerify that your input arrays have compatible dimensions. For example, `X` should typically be `(n_samples, n_features)`, while `A` and `Y` could be `(n_samples,)` or `(n_samples, 1)`. -
AttributeError: 'DECAF' object has no attribute 'generate_synthetic_data'
cause You might be attempting to use a method that doesn't exist or is not available on the `DECAF` instance, possibly due to a typo or misunderstanding of the API.fixCheck the official documentation or the `decaf/__init__.py` source code to confirm method names and their availability. The correct method to generate synthetic data from a trained model is `model.generate_synthetic_data()`.
Warnings
- breaking As the library is in early development (version 0.1.x), expect potential API changes, breaking modifications, and new features in minor or patch releases.
- gotcha The `DECAF` model expects specific input formats (NumPy arrays) for features (X), treatment (A), and outcome (Y). Mismatched shapes or types can lead to errors during model initialization or training.
- gotcha Training `DECAF` models, especially on larger datasets or with many epochs, can be computationally intensive and require significant memory. Default parameters might not be optimized for all environments.
Install
-
pip install decaf-synthetic-data
Imports
- DECAF
from decaf import DECAF
- SyntheticData
from decaf.synthetic_data import SyntheticData
Quickstart
import numpy as np
from decaf import DECAF
from decaf.synthetic_data import SyntheticData
# 1. Generate initial synthetic data with a known structure
n = 1000 # Number of samples
p = 10 # Number of features
seed = 42
sd = SyntheticData(n=n, p=p, seed=seed)
data = sd.generate_data() # Returns a dictionary with 'x', 'a', 'y'
X_orig = data['x'] # Features
A_orig = data['a'] # Treatment
Y_orig = data['y'] # Outcome
print(f"Original X shape: {X_orig.shape}, A shape: {A_orig.shape}, Y shape: {Y_orig.shape}")
# 2. Initialize and train the DECAF model
# (using a small number of epochs for quick demonstration)
model = DECAF(X_orig, A_orig, Y_orig, epochs=10, verbose=False, seed=seed)
model.train()
# 3. Generate new synthetic data using the trained DECAF model
n_synthetic = 500
synthetic_X, synthetic_A = model.generate_synthetic_data(n_samples=n_synthetic)
print(f"Synthetic X shape: {synthetic_X.shape}, Synthetic A shape: {synthetic_A.shape}")
# Further steps would involve evaluating fairness or causal effects on this synthetic data