{"id":8972,"library":"empirical-calibration","title":"Empirical Calibration","description":"Empirical Calibration (EC) is a Python library (version 0.12) designed for correcting bias in data samples using generic weighting methods. It formulates the calibration problem as a convex optimization, solved efficiently in a dual form, and aims to reduce data biases in various statistical fields, such as survey sampling and causal studies with observational data. The library is actively maintained, with the latest release in May 2024 and ongoing development on GitHub.","status":"active","version":"0.12","language":"en","source_language":"en","source_url":"https://github.com/google/empirical_calibration","tags":["calibration","weighting","bias correction","survey sampling","causal inference","convex optimization","statistics","machine learning"],"install":[{"cmd":"pip install empirical-calibration","lang":"bash","label":"Install from PyPI"},{"cmd":"pip install -q git+https://github.com/google/empirical_calibration","lang":"bash","label":"Install from GitHub (latest develop)"}],"dependencies":[{"reason":"Fundamental numerical operations and array handling.","package":"numpy"},{"reason":"Data manipulation, especially for covariate dataframes.","package":"pandas"},{"reason":"Optimization routines (e.g., `scipy.optimize`) are used for the convex optimization problem.","package":"scipy"},{"reason":"Used for preprocessing tasks, such as `preprocessing.StandardScaler` for covariates.","package":"scikit-learn"},{"reason":"Used internally for formula-based design matrix creation, though not directly exposed in the main API calls like `calibrate`.","package":"patsy"}],"imports":[{"note":"The recommended and standard alias for the library.","symbol":"empirical_calibration","correct":"import empirical_calibration as ec"}],"quickstart":{"code":"import numpy as np\nimport pandas as pd\nimport empirical_calibration as ec\n\n# Create dummy covariate dataframes for demonstration\n# In a real scenario, these would come from your biased sample and target population\ncovariates_sample = pd.DataFrame({\n    'sex': np.random.choice([0, 1], size=100),\n    'age': np.random.randint(18, 65, size=100)\n})\ntarget_covariates = pd.DataFrame({\n    'sex': np.random.choice([0, 1], size=1000),\n    'age': np.random.randint(18, 65, size=1000)\n})\n\n# Apply empirical calibration to compute weights\n# Using ENTROPY objective as a common choice\ntry:\n    weights, _ = ec.maybe_exact_calibrate(\n        covariates=covariates_sample,\n        target_covariates=target_covariates,\n        objective=ec.Objective.ENTROPY\n    )\n    print(f\"Successfully computed weights. First 5 weights: {weights[:5]}\")\n    print(f\"Sum of weights: {np.sum(weights):.2f}\")\nexcept ec.ConvergenceError as e:\n    print(f\"Calibration did not converge: {e}\")\nexcept Exception as e:\n    print(f\"An unexpected error occurred: {e}\")","lang":"python","description":"This quickstart demonstrates how to use `empirical_calibration` to compute sample weights. It simulates two sets of covariates: `covariates_sample` representing your biased data and `target_covariates` representing the desired distribution (e.g., from a population). The `maybe_exact_calibrate` function then calculates weights for the sample data such that its weighted covariate distribution matches the target distribution as closely as possible, using the specified optimization objective (here, `ENTROPY`)."},"warnings":[{"fix":"Ensure both inputs have identical structure (column names, order, and data types) for the variables intended for calibration. Consider explicit type conversions or column reordering if loading from different sources.","message":"The `covariates` and `target_covariates` inputs should typically be pandas DataFrames or numpy arrays with consistent columns and order. Mismatched column names or different data types can lead to unexpected behavior or errors during internal preprocessing and optimization.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Consider adjusting the `objective` (e.g., `QUADRATIC` might be more robust for some problems), `epsilon` (tolerance for matching marginals), or `max_iter` parameters in `calibrate` or `maybe_exact_calibrate`. Pre-processing covariates (e.g., binning continuous variables, handling rare categories) can also improve convergence.","message":"The calibration optimization problem may not always converge, especially with highly disparate covariate distributions, sparse data, or certain objective choices. This results in a `ConvergenceError`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always verify that documentation or examples found online pertain to the Python `empirical-calibration` library (often associated with `google/empirical_calibration` on GitHub) to avoid applying incorrect API calls or concepts.","message":"`empirical-calibration` is a distinct Python library. There is also an R package named 'EmpiricalCalibration' (e.g., by OHDSI) which addresses similar statistical concepts but has a different API and implementation. Do not confuse the two when searching for documentation or examples.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Verify that `covariates` and `target_covariates` have the same number of columns (features) and that their internal representations (e.g., after `patsy.dmatrix` or `sklearn.preprocessing`) are compatible. Inspect the `.shape` attribute of the arrays passed to the calibration function.","cause":"This error often occurs when `covariates` and `target_covariates` have incompatible shapes or internal data structures that prevent proper element-wise operations during the calibration process. This could be due to different numbers of features or incorrect reshaping.","error":"ValueError: operands could not be broadcast together with shapes (X,) (Y,)"},{"fix":"Increase `max_iter` (e.g., `max_iter=1000`) or relax `epsilon` (e.g., `epsilon=1e-3`) in the `maybe_exact_calibrate` function. Consider simplifying your covariates by grouping categories or binning continuous features, or re-evaluating if the target distribution is realistically achievable from the sample.","cause":"The iterative optimization algorithm failed to find a solution that satisfies the convergence criteria within the allowed number of iterations. This means the target covariate distribution could not be matched exactly or within tolerance.","error":"empirical_calibration.core.ConvergenceError: Maximum number of iterations reached."},{"fix":"Double-check that all column names in both `covariates_sample` and `target_covariates` are identical and spelled correctly. Print `df.columns` for both DataFrames to ensure alignment before passing them to the calibration function.","cause":"The library attempts to access a column in your `covariates` or `target_covariates` that does not exist. This typically happens if column names in pandas DataFrames are inconsistent or misspelled.","error":"KeyError: 'some_column_name'"}]}