DoWhy
DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions, following a four-step framework: Model, Identify, Estimate, and Refute. It aims to bridge econometric and machine learning approaches to causality. The current version is 0.14, and it sees regular releases, typically every few months, incorporating new features, estimators, and compatibility updates.
Common errors
-
ModuleNotFoundError: No module named 'graphviz' OR pydot.InvocationException: Program terminated with status: 1. stderr: 'dot: command not found'
cause Graphviz executable or the `pydot` Python package is missing. DoWhy relies on Graphviz for visualizing causal graphs.fixInstall the `pydot` Python package (`pip install pydot`) and ensure the Graphviz executables are installed and available in your system's PATH. On Linux, run `sudo apt-get install graphviz`; on macOS, `brew install graphviz`; on Windows, install from `graphviz.org`. -
KeyError: '[column_name]' OR ValueError: Column [column_name] not found in the data.
cause Mismatch between column names specified in `treatment`, `outcome`, or the causal graph string and the actual column names in the input DataFrame.fixDouble-check all column names in your `data` DataFrame and ensure they exactly match the strings used in `CausalModel` initialization and the causal graph string. -
ValueError: Method `[method_name]` is not a valid estimation method. Check `DoWhy.list_supported_methods()`
cause The specified estimation method name for `estimate_effect` is incorrect or not supported for the identified estimand type or current DoWhy version.fixVerify the exact spelling of the method name. Use `model.list_supported_methods()` after identifying the estimand to see the available options that are compatible with your specific problem and estimand type.
Warnings
- gotcha Defining an accurate causal graph is critical. Incorrectly specified graphs, especially neglecting unobserved confounders or including colliders, will lead to invalid causal effect estimates, which is the biggest footgun in causal inference.
- breaking DoWhy currently supports Python versions up to 3.13. Attempting to install or run DoWhy on Python 3.14 (when released) will likely encounter dependency conflicts or `ImportError` due to package compatibility constraints.
- gotcha While `pip install dowhy` installs core functionality, advanced features like plotting causal graphs or using certain estimators (e.g., from `econml`) require extra dependencies that are not installed by default.
Install
-
pip install dowhy -
pip install dowhy[all]
Imports
- CausalModel
from dowhy import CausalModel
- gcm
from dowhy import gcm
Quickstart
import dowhy
from dowhy import CausalModel
import pandas as pd
import numpy as np
# 1. Generate some sample data
np.random.seed(1)
n_samples = 100
treatment = np.random.randint(0, 2, n_samples)
confounder = np.random.normal(0, 1, n_samples)
outcome = 2 * treatment + 3 * confounder + np.random.normal(0, 1, n_samples)
data = pd.DataFrame({'treatment': treatment, 'confounder': confounder, 'outcome': outcome})
# 2. Model the causal problem
# Using a simple string-based GML representation of the graph
model=CausalModel(data=data,
graph="digraph { confounder -> treatment; confounder -> outcome; treatment -> outcome;}",
treatment=['treatment'],
outcome=['outcome'])
# 3. Identify a causal effect
identified_estimand = model.identify_effect(estimand_type="nonparametric-ate")
# 4. Estimate the causal effect using a statistical method
causal_estimate = model.estimate_effect(identified_estimand,
method_name="backdoor.linear_regression",
control_value=0,
treatment_value=1)
print(f"Causal Estimate: {causal_estimate.value}")
# 5. Refute the obtained estimate
# Using a refutation method to check robustness
refutation = model.refute_estimate(identified_estimand, causal_estimate,
method_name="random_common_cause")
print(f"Refutation (random common cause): {refutation.refutation_result}")