rdrobust Python Library
The `rdrobust` Python library (current version 1.3.0) implements local polynomial Regression Discontinuity (RD) point estimators with robust bias-corrected confidence intervals and inference procedures. It is actively maintained and regularly updated, with releases typically aligning with new features or improvements to the underlying R/C++ codebase it wraps.
Common errors
-
TypeError: 'list' object has no attribute 'shape'
cause The input variables `y` or `x` were passed as standard Python lists, which lack the 'shape' attribute expected by the underlying numerical operations.fixConvert your Python lists to NumPy arrays (`np.array(my_list)`) or Pandas Series (`pd.Series(my_list)`) before passing them to `rd.rdrobust()`. -
ValueError: The cutoff c is outside the range of x.
cause The value specified for the `c` parameter (the cutoff point) does not lie between the minimum and maximum values of your running variable `x`.fixVerify that `min(x) <= c <= max(x)`. Adjust `c` to be within the observed range of `x` or ensure your data is appropriately pre-processed. -
AttributeError: 'rdrobust_output' object has no attribute 'p_value' (or similar for 'ci', 'se')
cause Attempting to access an attribute that either doesn't exist by that name, has been renamed, or requires an index because multiple values are stored (e.g., for different null hypotheses or variance types). The `rdrobust_output` object is structured.fixUse `r.summary()` to get a comprehensive, formatted overview of all results. To access specific values, inspect the object's attributes (e.g., `dir(r)` or `r.__dict__`) or refer to the `rdrobust` documentation. Often, p-values, estimates, etc., are stored in arrays like `r.p_values[0]` or `r.estimate[0]`. -
ModuleNotFoundError: No module named 'rdrobust'
cause The `rdrobust` Python package has not been installed in your current Python environment, or the environment is not correctly activated.fixOpen your terminal or command prompt and run `pip install rdrobust`. If using virtual environments, ensure the correct environment is activated before running your Python script.
Warnings
- gotcha Input data for `y` and `x` must be numeric arrays (NumPy arrays or Pandas Series). Passing standard Python lists directly will result in a TypeError.
- gotcha The specified cutoff `c` must fall within the range (min to max) of the running variable `x`. If `c` is outside this range, the function will raise a ValueError.
- gotcha While `rdrobust` calculates optimal bandwidths by default, understanding the theoretical underpinnings and implications of these choices (`h` for estimation, `b` for bias correction) is crucial. Default bandwidths (e.g., MSE-optimal) may not always be appropriate for all research questions, and sensitivity analyses are recommended.
- gotcha `rdrobust` is designed for estimation around a single, pre-specified cutoff. If your data has multiple potential cutoffs or a fuzzy RD design, you will need to adapt your analysis strategy (e.g., iterative calls for multiple sharp cutoffs, or using other specialized functions).
Install
-
pip install rdrobust
Imports
- rdrobust
import rdrobust as rd
Quickstart
import numpy as np
import pandas as pd
import rdrobust as rd
# Simulate data for a regression discontinuity design
np.random.seed(123)
n = 500
# Running variable 'x' from -1 to 1
x = np.random.uniform(-1, 1, n)
# Outcome 'y' with a jump at x=0 (the cutoff)
y = 3 + 2 * x + 4 * (x >= 0) + np.random.normal(0, 1, n)
# Convert to pandas Series, which is a common and robust input format
y_series = pd.Series(y)
x_series = pd.Series(x)
# Apply rdrobust with the cutoff c=0
# The output 'r' is an rdrobust.rdrobust_output object
r = rd.rdrobust(y_series, x_series, c=0)
# Print a summary of the results
print("\nRD Robust Results:")
print(r.summary())
# You can also access individual components, e.g., the point estimate
# print(f"Point Estimate: {r.estimate[0]}")