Causalmodels
Causalmodels is a Python library for defining, analyzing, and inferring causal relationships from data, drawing inspiration from Judea Pearl's do-calculus. It provides tools for building Bayesian causal models, performing matching, and conducting regression-based causal inference. The current version is 0.4.0, with an irregular release cadence.
Common errors
-
KeyError: "['NonExistentColumn']" or similar error indicating a missing column.
cause The column name provided for `treatment`, `outcome`, or `control_variables` does not exist in the input DataFrame.fixDouble-check the spelling of column names against `df.columns` to ensure they exactly match the DataFrame columns, respecting case sensitivity. -
TypeError: unsupported operand type(s) for +: 'str' and 'int' or similar numerical calculation error.
cause One or more columns intended for numerical calculation (treatment, outcome, control) contain non-numeric data (e.g., strings, objects) that cannot be directly processed by the underlying statistical models.fixConvert all relevant columns to appropriate numeric types (e.g., `float`, `int`) using `df['column'] = pd.to_numeric(df['column'], errors='coerce')` before passing the DataFrame to `causalmodels` methods.
Warnings
- gotcha Causal inference methods in `causalmodels` (and generally) rely on strong assumptions (e.g., no unmeasured confounders, correct specification of the causal graph). Failing to meet these assumptions can lead to biased estimates.
- gotcha Input data to `causalmodels` methods must be clean and appropriately preprocessed. Missing values, incorrect data types, or inconsistent column names can lead to errors or silently biased results during estimation.
Install
-
pip install causalmodels
Imports
- BayesianModel
from causalmodels.bayesian_model import BayesianModel
- Matching
from causalmodels.matching import Matching
- Regression
from causalmodels.regression import Regression
- CausalDAG
from causalmodels.causal_dag import CausalDAG
- CausalInference
from causalmodels.inference import CausalInference
Quickstart
import pandas as pd
import numpy as np
from causalmodels.regression import Regression
# Simulate some data with a known causal effect
np.random.seed(42)
n_samples = 1000
# Confounder Z affects both Treatment X and Outcome Y
Z = np.random.normal(0, 1, n_samples)
# Treatment X is affected by Z
X = 0.5 * Z + np.random.normal(0, 1, n_samples)
# Outcome Y is affected by X and Z
Y = 2.0 * X + 1.0 * Z + np.random.normal(0, 1, n_samples)
data = pd.DataFrame({'Z': Z, 'X': X, 'Y': Y})
# Initialize the Regression model
# X: treatment variable, Y: outcome variable, control_variables: confounders
model = Regression(data, treatment='X', outcome='Y', control_variables=['Z'])
# Estimate the Average Treatment Effect (ATE)
ate_estimate = model.estimate_ate()
print(f"Observed data with N={n_samples} samples.")
print(f"Estimated Average Treatment Effect (ATE) of X on Y, controlling for Z: {ate_estimate:.4f}")