pyhdfe
pyhdfe is a Python library for absorbing high-dimensional fixed effects, implementing the algorithm developed by Gaure (2013). It is primarily used in econometrics and statistics for estimating models with several high-dimensional fixed effects, optimized for sparse data structures. The current version is 0.2.0, with an intermittent release cadence.
Warnings
- gotcha For extremely large datasets or a very high number of levels in fixed effects, memory consumption can still be significant, despite optimizations for sparse data. Monitor memory usage carefully.
- gotcha The Gauss-Seidel algorithm used for absorption can sometimes converge slowly or fail to converge for certain data structures or with highly collinear fixed effects. This is a common challenge for iterative solvers.
- gotcha The `fixed_effects` argument expects a list of 1-D arrays or pandas Series, where each element represents a single fixed effect column. Incorrect formatting (e.g., passing a 2-D array directly) will lead to errors.
Install
-
pip install pyhdfe
Imports
- hdfe
from pyhdfe import hdfe
Quickstart
import numpy as np
import pandas as pd
from pyhdfe import hdfe
# Create some dummy data
np.random.seed(42)
n_obs = 1000
n_fixed_effects = 3
X = pd.DataFrame(np.random.rand(n_obs, 5), columns=[f'x{i}' for i in range(5)])
y = pd.Series(np.random.rand(n_obs))
fixed_effects_data = []
for i in range(n_fixed_effects):
n_levels = np.random.randint(50, 200) # Varying number of levels
fixed_effects_data.append(pd.Series(np.random.randint(0, n_levels, n_obs)))
# Absorb fixed effects from X and y
# fixed_effects is a list of 1D arrays/Series representing each fixed effect column
# absorb_cols specifies which columns from X to transform
# drop_cols specifies columns to drop before transformation (often the intercept)
X_transformed, y_transformed = hdfe.hdfe_cluster_col(
X,
y,
fixed_effects=fixed_effects_data,
absorb_cols=X.columns.tolist(), # Absorb all X columns
drop_cols=[], # No columns to drop in this example
get_residuals=True
)
print(f"Original X shape: {X.shape}")
print(f"Transformed X shape: {X_transformed.shape}")
print(f"Original y shape: {y.shape}")
print(f"Transformed y shape: {y_transformed.shape}")