CTGAN

0.12.1 · active · verified Thu Apr 16

CTGAN is a Python library implementing a Conditional Generative Adversarial Network (GAN) specifically designed for synthesizing tabular data. It learns from real datasets to generate high-fidelity synthetic data, addressing challenges like mixed data types and imbalanced categorical columns. The library is actively maintained, with version 0.12.1 released in February 2026, and is part of the broader SDV (Synthetic Data Vault) ecosystem.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load a demo dataset, initialize a `CTGAN` model, fit it to the real data by specifying discrete columns, and then generate a sample of synthetic data. It prints the head of both the original and synthetic datasets for a quick comparison.

import pandas as pd
from ctgan import CTGAN, load_demo

# Load demo data (Adult Census Dataset) or replace with your own DataFrame
real_data = load_demo()

# Identify discrete columns
discrete_columns = [
    'workclass', 'education', 'marital-status', 'occupation',
    'relationship', 'race', 'sex', 'native-country'
]

# Initialize and train the CTGAN model
# Set verbose=True to see training progress
ctgan = CTGAN(epochs=10, verbose=True)
ctgan.fit(real_data, discrete_columns)

# Generate synthetic data
synthetic_data = ctgan.sample(num_rows=1000)

print("Original data head:")
print(real_data.head())
print("\nSynthetic data head:")
print(synthetic_data.head())

view raw JSON →