Copulas

0.14.1 · active · verified Thu Apr 16

Copulas is a Python library for modeling multivariate distributions and sampling from them using copula functions. It enables users to learn the dependence structure from tabular numerical data and generate new synthetic data with similar statistical properties, offering various univariate distributions, Archimedian, Gaussian, and Vine Copulas. As part of The Synthetic Data Vault Project by DataCebo, it is actively maintained with regular updates.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load a sample dataset, fit a Gaussian Multivariate Copula model to it, and then generate new synthetic data that statistically resembles the original. It also includes an optional visualization step to compare the real and synthetic data.

import pandas as pd
from copulas.datasets import sample_trivariate_xyz
from copulas.multivariate import GaussianMultivariate
import warnings

# Suppress FutureWarnings from certain dependencies for cleaner output
warnings.filterwarnings('ignore', category=FutureWarning)

# 1. Load a demo dataset (or your own pandas DataFrame)
real_data = sample_trivariate_xyz()
print("Original Data Head:\n", real_data.head())

# 2. Initialize and fit a multivariate copula model
copula = GaussianMultivariate()
copula.fit(real_data)
print("\nCopula model fitted successfully.")

# 3. Generate new synthetic data points
synthetic_data = copula.sample(len(real_data))
print("\nSynthetic Data Head:\n", synthetic_data.head())

# Optional: To visualize, uncomment the following lines and ensure a graphical environment
# from copulas.visualization import compare_3d
# compare_3d(real_data, synthetic_data, figsize=(10, 5))
# print("\nComparison plot generated (if running in a graphical environment).")

view raw JSON →