DECAF Synthetic Data

0.1.7 · active · verified Thu Apr 16

DECAF (DEbiasing CAusal Fairness) is a Python library providing tools for generating synthetic data and debiasing causal effects. It implements methods to create synthetic datasets that capture complex causal relationships while mitigating various forms of bias, enabling researchers and practitioners to evaluate and develop fair causal inference models. Currently at version 0.1.7, the library is under active development with a focus on research-driven advancements.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `SyntheticData` to generate a base dataset, then how to initialize and train the `DECAF` model with this data, and finally, generate new synthetic samples from the trained model. This workflow is typical for evaluating debiasing strategies.

import numpy as np
from decaf import DECAF
from decaf.synthetic_data import SyntheticData

# 1. Generate initial synthetic data with a known structure
n = 1000  # Number of samples
p = 10    # Number of features
seed = 42
sd = SyntheticData(n=n, p=p, seed=seed)
data = sd.generate_data() # Returns a dictionary with 'x', 'a', 'y'

X_orig = data['x'] # Features
A_orig = data['a'] # Treatment
Y_orig = data['y'] # Outcome

print(f"Original X shape: {X_orig.shape}, A shape: {A_orig.shape}, Y shape: {Y_orig.shape}")

# 2. Initialize and train the DECAF model
# (using a small number of epochs for quick demonstration)
model = DECAF(X_orig, A_orig, Y_orig, epochs=10, verbose=False, seed=seed)
model.train()

# 3. Generate new synthetic data using the trained DECAF model
n_synthetic = 500
synthetic_X, synthetic_A = model.generate_synthetic_data(n_samples=n_synthetic)

print(f"Synthetic X shape: {synthetic_X.shape}, Synthetic A shape: {synthetic_A.shape}")
# Further steps would involve evaluating fairness or causal effects on this synthetic data

view raw JSON →