SDMetrics

0.28.0 · active · verified Thu Apr 16

SDMetrics is an open-source Python library developed by DataCebo (part of the Synthetic Data Vault project) for evaluating the quality and efficacy of synthetic datasets. It provides a variety of metrics to compare synthetic data against real data across aspects like quality, privacy, and utility, and includes tools for generating comprehensive visual reports. The library is model-agnostic, allowing evaluation of synthetic data generated by any model. The current version is 0.28.0, with active and frequent releases.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load demo data, generate a single-table QualityReport, retrieve the overall score, and optionally visualize results. SDMetrics can also work with your own pandas DataFrames and metadata dictionaries.

import pandas as pd
from sdmetrics import load_demo
from sdmetrics.reports.single_table import QualityReport

# Load demo data (real, synthetic, and metadata)
real_data, synthetic_data, metadata = load_demo(modality='single_table')

# Or create your own dataframes and metadata
# real_data = pd.DataFrame({'column1': [1, 2, 3], 'column2': ['A', 'B', 'C']})
# synthetic_data = pd.DataFrame({'column1': [1, 2, 2], 'column2': ['A', 'C', 'B']})
# metadata = {'columns': {'column1': {'sdtype': 'numerical'}, 'column2': {'sdtype': 'categorical'}}, 'primary_key': None}

# Create a QualityReport
report = QualityReport()

# Generate the report
report.generate(real_data, synthetic_data, metadata)

# Print the overall quality score
print(f"Overall Quality Score: {report.get_score():.2f}%")

# Get a visualization for a specific property (e.g., 'Column Shapes')
# fig = report.get_visualization(property_name='Column Shapes')
# fig.show()

# Save the report
# report.save(filepath='demo_data_quality_report.pkl')
# To load later: loaded_report = QualityReport.load(filepath='demo_data_quality_report.pkl')

view raw JSON →