EconML
EconML is a Python library for estimating Conditional Average Treatment Effects (CATEs) from observational or experimental data. It provides a suite of advanced machine learning methods, including Double Machine Learning (DML) and Causal Forests, to infer causal relationships and individual-level treatment effects. The current version is 0.16.0, and it maintains an active development pace with major updates and bugfix releases every few months.
Warnings
- breaking The `DynamicDML` estimator was moved from `econml.dml` to `econml.panel`. Direct imports from the old path will raise an `ImportError`.
- breaking The default `alpha` value for confidence intervals in methods like `effect_interval` changed from `None` (user-defined or estimator-specific) to `0.05` (representing 95% confidence).
- gotcha EconML v0.16.0 requires `shap` versions `>=0.40.0` and `<0.44.0`. Using `shap` version 0.44.0 or higher will lead to installation conflicts or runtime errors when calling `shap_values`.
- gotcha The `deepiv` module, containing the `DeepIV` estimator, requires `tensorflow` and `keras` to be installed separately. These libraries can have strict Python and other dependency version requirements that may cause conflicts.
- gotcha Distributed training features (e.g., for scaling `OrthoLearner`s) rely on the `ray` library, which is an optional dependency and not installed by default.
Install
-
pip install econml -
pip install econml[ray] # For distributed training with Ray pip install econml[deepiv] # For DeepIV estimator with TensorFlow/Keras pip install econml[all] # For all optional dependencies
Imports
- CausalForestDML
from econml.dml import CausalForestDML
- LinearDML
from econml.dml import LinearDML
- DynamicDML
from econml.panel import DynamicDML
Quickstart
import numpy as np
import pandas as pd
from econml.dml import CausalForestDML
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
# Simulate data
np.random.seed(42)
n_samples = 1000
W = np.random.normal(0, 1, size=(n_samples, 3)) # Confounders
X = np.random.normal(0, 1, size=(n_samples, 2)) # Features for heterogeneity
T = (W[:, 0] + W[:, 1] + np.random.normal(0, 1, n_samples) > 0).astype(float) # Treatment
Y = W[:, 0] + W[:, 2] + T * (X[:, 0] + np.random.normal(0, 0.1, n_samples)) + np.random.normal(0, 1, n_samples) # Outcome
# Initialize and fit the CausalForestDML model
est = CausalForestDML(
model_y=RandomForestRegressor(min_samples_leaf=5, n_estimators=100, random_state=42),
model_t=RandomForestClassifier(min_samples_leaf=5, n_estimators=100, random_state=42),
cv=5,
random_state=42
)
est.fit(Y, T, X=X, W=W)
# Estimate CATE for new data (or original X)
X_test = np.array([[0.5, 0.5], [-0.5, -0.5]])
cate_estimates = est.effect(X_test)
print(f"CATE estimates for X_test: {cate_estimates}")
# Expected output: CATE estimates for X_test: [0.67204641 0.44976767]