{"id":10220,"library":"sdv","title":"SDV: Synthetic Data Vault","description":"SDV (Synthetic Data Vault) is a Python library that allows users to generate synthetic data for various data types, including single tables, multi-table relational datasets, and sequential data. It provides a range of models and tools to create high-quality synthetic data that preserves the statistical properties and privacy of the original data. As of version 1.36.0, it continues to be actively developed, with a regular release cadence to add new features and improve existing models.","status":"active","version":"1.36.0","language":"en","source_language":"en","source_url":"https://github.com/sdv-dev/SDV","tags":["synthetic data","data generation","privacy","machine learning","tabular data","data science"],"install":[{"cmd":"pip install sdv","lang":"bash","label":"Install SDV"}],"dependencies":[],"imports":[{"note":"Synthesizers were moved out of 'sdv.models' in v0.17.0. Use 'sdv.single_table', 'sdv.multi_table', or 'sdv.sequential' depending on your data type.","wrong":"from sdv.models import GaussianCopulaSynthesizer","symbol":"GaussianCopulaSynthesizer","correct":"from sdv.single_table import GaussianCopulaSynthesizer"},{"note":"The 'sdv.lite' module was deprecated and removed. Presets are now found directly under the respective data type modules.","wrong":"from sdv.lite import SingleTablePreset","symbol":"SingleTablePreset","correct":"from sdv.single_table.preset import SingleTablePreset"},{"symbol":"load_dataset","correct":"from sdv.datasets.demo import load_dataset"}],"quickstart":{"code":"import pandas as pd\nfrom sdv.single_table import GaussianCopulaSynthesizer\nfrom sdv.datasets.demo import load_dataset\n\n# 1. Load a demo dataset (returns an SDVData object with data and metadata)\nreal_data = load_dataset('PUMS')\n\n# 2. Initialize a synthesizer, passing the metadata\nsynthesizer = GaussianCopulaSynthesizer(metadata=real_data.metadata)\n\n# 3. Fit the synthesizer to the real data\nsynthesizer.fit(real_data.data)\n\n# 4. Sample synthetic data\nsynthetic_data = synthesizer.sample(num_rows=len(real_data.data))\n\nprint(\"Original data head:\")\nprint(real_data.data.head())\nprint(\"\\nSynthetic data head:\")\nprint(synthetic_data.head())","lang":"python","description":"This quickstart demonstrates how to load a demo dataset, initialize a `GaussianCopulaSynthesizer` with the dataset's metadata, fit the synthesizer to the real data, and then sample synthetic data. This is a common workflow for single-table synthetic data generation."},"warnings":[{"fix":"Update your imports. For single-table synthesizers, use `from sdv.single_table import ...`. For multi-table, `from sdv.multi_table import ...`, and for sequential, `from sdv.sequential import ...`.","message":"Synthesizer import paths were changed in SDV v0.17.0. The `sdv.models` and `sdv.tabular` modules were removed.","severity":"breaking","affected_versions":"<0.17.0"},{"fix":"Manually create and pass `SingleTableMetadata` or `MultiTableMetadata` objects. Define primary keys, relationships, data types, and potentially privacy considerations explicitly for best results.","message":"While SDV can infer metadata, explicit metadata definition is often crucial for higher quality synthetic data, especially with complex schemas or specific data types (e.g., primary keys, relationships, sensitive columns).","severity":"gotcha","affected_versions":"All versions"},{"fix":"Consider downsampling your data for initial experimentation. Ensure your environment has sufficient RAM. For production-scale needs, explore SDV's performance optimization features or consider distributed processing frameworks if applicable.","message":"Generating synthetic data for very large datasets (millions of rows) or complex multi-table schemas can be memory-intensive and time-consuming.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"Change your import statement from `from sdv.models import SynthesizerName` to `from sdv.single_table import SynthesizerName` (or `multi_table`/`sequential` as appropriate).","cause":"Attempting to import a synthesizer from an old module path that was removed in SDV v0.17.0.","error":"ModuleNotFoundError: No module named 'sdv.models'"},{"fix":"Preprocess your data to convert unsupported columns into one of the supported types. Explicitly define column types in your `sdv.metadata` object to guide the synthesizer.","cause":"SDV synthesizers have limitations on the types of data they can process directly (e.g., complex objects, nested lists, mixed types), or metadata inference incorrectly assigned a type.","error":"ValueError: The column '...' contains unsupported data types. Supported data types are numeric, boolean, datetime, and categorical."},{"fix":"Ensure your training data has a sufficient number of rows (typically several dozens or hundreds at minimum, depending on complexity) to provide enough statistical information for the model. SDV is not designed for extremely small datasets.","cause":"The input dataset provided to `synthesizer.fit()` has too few rows for the selected synthesizer to effectively learn the underlying data patterns and statistical distributions.","error":"NotEnoughDataError: Not enough data for synthesizer to learn from. Expected at least X rows but got Y rows."}]}