powershap

raw JSON →
0.1.0.1 verified Fri May 01 auth: no python

Powerful feature selection using statistical significance of SHAP values. Current version: 0.1.0.1. Active development, major releases every few months.

pip install powershap
error ModuleNotFoundError: No module named 'powershap'
cause The package is not installed or installed in a different environment.
fix
Run 'pip install powershap' in the correct Python environment.
error ImportError: cannot import name 'PowerShap' from 'powershap'
cause The import path is wrong or the installed version is very old (pre-0.0.2).
fix
Ensure you have version >=0.0.2 and use 'from powershap import PowerShap'.
error ValueError: The model must be a fitted estimator or a classifier/regressor.
cause The provided model is not an estimator or is not fitted.
fix
Pass an unfitted estimator (e.g., RandomForestClassifier()) to PowerShap; it will be fitted internally.
gotcha The 'model' parameter can be a scikit-learn estimator or a pipeline. When using a pipeline, ensure the final step is an estimator.
fix Use an estimator with .fit() and .predict() or .predict_proba() methods.
gotcha PowerShap can be memory-intensive for high-dimensional data because it computes SHAP values for all features.
fix Reduce the number of features or use a smaller sample size via the 'sample' parameter.
deprecated The 'power_analysis' parameter previously accepted 'power' or 'min_power'; now 'auto' is recommended as it estimates sample size automatically.
fix Use power_analysis='auto' to avoid manual tuning.

Quickstart: fit a PowerShap selector on a classification dataset and get selected features.

from powershap import PowerShap
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
import pandas as pd

X, y = make_classification(n_samples=100, n_features=20, random_state=42)
X = pd.DataFrame(X, columns=[f'feature_{i}' for i in range(X.shape[1])])

selector = PowerShap(
    model=RandomForestClassifier(),
    power_analysis='auto',       # automatic estimation
    cv=5,
    random_state=42
)
selector.fit(X, y)
print('Selected features:', selector.transform(X).columns.tolist())