{"id":9299,"library":"sdmetrics","title":"SDMetrics","description":"SDMetrics is an open-source Python library developed by DataCebo (part of the Synthetic Data Vault project) for evaluating the quality and efficacy of synthetic datasets. It provides a variety of metrics to compare synthetic data against real data across aspects like quality, privacy, and utility, and includes tools for generating comprehensive visual reports. The library is model-agnostic, allowing evaluation of synthetic data generated by any model. The current version is 0.28.0, with active and frequent releases.","status":"active","version":"0.28.0","language":"en","source_language":"en","source_url":"https://github.com/sdv-dev/SDMetrics","tags":["synthetic data","data quality","metrics","evaluation","privacy","data science"],"install":[{"cmd":"pip install sdmetrics","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Requires Python versions >=3.9, <3.15.","package":"python","optional":false}],"imports":[{"symbol":"load_demo","correct":"from sdmetrics import load_demo"},{"symbol":"QualityReport","correct":"from sdmetrics.reports.single_table import QualityReport"},{"symbol":"CategoryCoverage","correct":"from sdmetrics.single_column import CategoryCoverage"}],"quickstart":{"code":"import pandas as pd\nfrom sdmetrics import load_demo\nfrom sdmetrics.reports.single_table import QualityReport\n\n# Load demo data (real, synthetic, and metadata)\nreal_data, synthetic_data, metadata = load_demo(modality='single_table')\n\n# Or create your own dataframes and metadata\n# real_data = pd.DataFrame({'column1': [1, 2, 3], 'column2': ['A', 'B', 'C']})\n# synthetic_data = pd.DataFrame({'column1': [1, 2, 2], 'column2': ['A', 'C', 'B']})\n# metadata = {'columns': {'column1': {'sdtype': 'numerical'}, 'column2': {'sdtype': 'categorical'}}, 'primary_key': None}\n\n# Create a QualityReport\nreport = QualityReport()\n\n# Generate the report\nreport.generate(real_data, synthetic_data, metadata)\n\n# Print the overall quality score\nprint(f\"Overall Quality Score: {report.get_score():.2f}%\")\n\n# Get a visualization for a specific property (e.g., 'Column Shapes')\n# fig = report.get_visualization(property_name='Column Shapes')\n# fig.show()\n\n# Save the report\n# report.save(filepath='demo_data_quality_report.pkl')\n# To load later: loaded_report = QualityReport.load(filepath='demo_data_quality_report.pkl')","lang":"python","description":"This quickstart demonstrates how to load demo data, generate a single-table QualityReport, retrieve the overall score, and optionally visualize results. SDMetrics can also work with your own pandas DataFrames and metadata dictionaries."},"warnings":[{"fix":"Upgrade your Python environment to 3.9 or a newer supported version (<3.15).","message":"SDMetrics dropped support for Python 3.8 starting from version 0.24.0. Ensure your environment uses Python 3.9 or newer.","severity":"breaking","affected_versions":">=0.24.0"},{"fix":"Ensure your Pandas version is <3.0 when using SDMetrics. Check SDMetrics release notes for future Pandas 3.x compatibility updates.","message":"SDMetrics pinned Pandas below version 3.0 in v0.26.0 to ensure compatibility. Direct usage with Pandas 3.x might lead to unexpected behavior or errors.","severity":"breaking","affected_versions":">=0.26.0"},{"fix":"Consider setting a `real_correlation_threshold` when computing `CorrelationSimilarity` to filter out column pairs without strong correlations in the real data. Values of 0.4 or higher are recommended.","message":"When using `CorrelationSimilarity` on noisy data with no clear trends, the metric might return a high score, indicating that the synthetic data successfully captures the non-existent 'trend'. This can be misleading if you expect to measure actual correlation preservation.","severity":"gotcha","affected_versions":"All"},{"fix":"Carefully review report breakdowns and individual metric scores. If `NaN`s appear unexpectedly, investigate the input data for those specific columns or metric configurations for potential causes of failure.","message":"When generating reports, if some metric computations fail, SDMetrics might report them as `NaN` (Not a Number) scores rather than explicit errors, potentially hiding underlying issues with data or metric configuration.","severity":"gotcha","affected_versions":"All"},{"fix":"Convert `SDV` metadata objects to a dictionary using the `.to_dict()` method before passing them to SDMetrics reports. Example: `report.generate(real_data, synthetic_data, sdv_metadata_object.to_dict())`.","message":"Passing an `SDV` metadata object directly to `sdmetrics.reports` (e.g., `QualityReport.generate`) will raise a `TypeError`. SDMetrics expects a plain dictionary for metadata.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Convert the SDV metadata object to a dictionary using its `.to_dict()` method: `report.generate(real_data, synthetic_data, sdv_metadata_object.to_dict())`.","cause":"An `sdv.metadata.SingleTableMetadata` (or similar SDV metadata object) was passed directly to an SDMetrics report method that expects a standard Python dictionary for metadata.","error":"TypeError: Expected a dictionary but received a <class 'sdv.metadata.SingleTableMetadata'> instead."},{"fix":"Pre-process your real and synthetic dataframes to handle missing values (e.g., imputation or removal) and outliers before passing them to SDMetrics. Check the metadata to ensure correct `sdtype` for columns.","cause":"This error often occurs when numerical data contains missing values (NaNs) or extreme values that a metric or underlying scikit-learn model cannot handle without prior processing.","error":"ValueError: Inputs contain NaN, infinity or a value too large for dtype('float64')."},{"fix":"Verify that the column names in your dataframes exactly match those referenced in your SDMetrics calls and the `metadata` dictionary.","cause":"The specified 'column_name' in a metric computation (e.g., `CategoryCoverage.compute`) or a report configuration does not exist in the provided real or synthetic dataframes.","error":"KeyError: 'column_name not found'"},{"fix":"Review the documentation for the specific metric being used to understand its data requirements. Ensure column `sdtype` in the metadata accurately reflects the data types and that there's enough data for computation.","cause":"This generic error can occur if the data does not meet the specific requirements of a metric (e.g., attempting a numerical correlation metric on categorical data, or insufficient data points).","error":"IncomputableMetricError: The metric cannot be computed with the given data."}]}