{"library":"sdv","title":"SDV: Synthetic Data Vault","description":"SDV (Synthetic Data Vault) is a Python library that allows users to generate synthetic data for various data types, including single tables, multi-table relational datasets, and sequential data. It provides a range of models and tools to create high-quality synthetic data that preserves the statistical properties and privacy of the original data. As of version 1.36.0, it continues to be actively developed, with a regular release cadence to add new features and improve existing models.","language":"python","status":"active","last_verified":"Fri Apr 17","install":{"commands":["pip install sdv"],"cli":null},"imports":["from sdv.single_table import GaussianCopulaSynthesizer","from sdv.single_table.preset import SingleTablePreset","from sdv.datasets.demo import load_dataset"],"auth":{"required":false,"env_vars":[]},"quickstart":{"code":"import pandas as pd\nfrom sdv.single_table import GaussianCopulaSynthesizer\nfrom sdv.datasets.demo import load_dataset\n\n# 1. Load a demo dataset (returns an SDVData object with data and metadata)\nreal_data = load_dataset('PUMS')\n\n# 2. Initialize a synthesizer, passing the metadata\nsynthesizer = GaussianCopulaSynthesizer(metadata=real_data.metadata)\n\n# 3. Fit the synthesizer to the real data\nsynthesizer.fit(real_data.data)\n\n# 4. Sample synthetic data\nsynthetic_data = synthesizer.sample(num_rows=len(real_data.data))\n\nprint(\"Original data head:\")\nprint(real_data.data.head())\nprint(\"\\nSynthetic data head:\")\nprint(synthetic_data.head())","lang":"python","description":"This quickstart demonstrates how to load a demo dataset, initialize a `GaussianCopulaSynthesizer` with the dataset's metadata, fit the synthesizer to the real data, and then sample synthetic data. This is a common workflow for single-table synthetic data generation.","tag":null,"tag_description":null,"last_tested":null,"results":[]},"compatibility":null}