pyhdfe

0.2.0 · active · verified Wed Apr 15

pyhdfe is a Python library for absorbing high-dimensional fixed effects, implementing the algorithm developed by Gaure (2013). It is primarily used in econometrics and statistics for estimating models with several high-dimensional fixed effects, optimized for sparse data structures. The current version is 0.2.0, with an intermittent release cadence.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `hdfe.hdfe_cluster_col` to absorb multiple high-dimensional fixed effects from a feature matrix `X` and a target vector `y`. It generates synthetic data with several fixed effect columns and then applies the absorption, returning the transformed (residualized) `X` and `y`.

import numpy as np
import pandas as pd
from pyhdfe import hdfe

# Create some dummy data
np.random.seed(42)
n_obs = 1000
n_fixed_effects = 3

X = pd.DataFrame(np.random.rand(n_obs, 5), columns=[f'x{i}' for i in range(5)])
y = pd.Series(np.random.rand(n_obs))

fixed_effects_data = []
for i in range(n_fixed_effects):
    n_levels = np.random.randint(50, 200) # Varying number of levels
    fixed_effects_data.append(pd.Series(np.random.randint(0, n_levels, n_obs)))

# Absorb fixed effects from X and y
# fixed_effects is a list of 1D arrays/Series representing each fixed effect column
# absorb_cols specifies which columns from X to transform
# drop_cols specifies columns to drop before transformation (often the intercept)

X_transformed, y_transformed = hdfe.hdfe_cluster_col(
    X,
    y,
    fixed_effects=fixed_effects_data,
    absorb_cols=X.columns.tolist(), # Absorb all X columns
    drop_cols=[], # No columns to drop in this example
    get_residuals=True
)

print(f"Original X shape: {X.shape}")
print(f"Transformed X shape: {X_transformed.shape}")
print(f"Original y shape: {y.shape}")
print(f"Transformed y shape: {y_transformed.shape}")

view raw JSON →