sklearndf

raw JSON →
2.4.2 verified Mon Apr 27 auth: no python

Data frame support and feature traceability for scikit-learn. Version 2.4.2 supports Python >=3.9, <4 and integrates with scikit-learn 1.6+. Release cadence is irregular, with recent minor versions every few months.

pip install sklearndf
error ImportError: cannot import name 'EstimatorWrapperDF' from 'sklearndf'
cause EstimatorWrapperDF is not in the top-level sklearndf module.
fix
Use: from sklearndf.transformation import EstimatorWrapperDF
error AttributeError: 'EstimatorWrapperDF' object has no attribute 'feature_names_in_'
cause The estimator has not been fitted or does not expose feature_names_in_ (requires scikit-learn >=1.0).
fix
Ensure the estimator is fitted and scikit-learn version is >=1.0.
error TypeError: All intermediate steps should be transformers or implement fit_transform. 'ClassifierDF' doesn't
cause Using a classifier (ClassifierDF) inside a PipelineDF as an intermediate step instead of a transformer.
fix
Place classifiers only at the end of the pipeline.
error ValueError: could not convert string to float: '...'
cause DataFrame contains non-numeric columns; sklearndf wrappers expect numeric input.
fix
Preprocess categorical features using OneHotEncoder or similar transformer before passing to estimator.
breaking sklearndf 2.4.0 introduced scikit-learn 1.7 support. Older scikit-learn versions may break with 2.4.0+.
fix Upgrade scikit-learn to >=1.6 (or 1.7 for full support).
gotcha Wrapper classes only accept scikit-learn estimators that support DataFrame output. Not all sklearn estimators do; check sklearn documentation.
fix Use only estimators that are known to return DataFrames (e.g., from sklearn.ensemble, sklearn.linear_model).
deprecated The pipeline module's `FitDataFrame` and `TransformDataFrame` are deprecated in favor of `EstimatorWrapperDF` and `TransformerWrapperDF`.
fix Use `sklearndf.transformation.EstimatorWrapperDF` for estimators and `TransformerWrapperDF` for transformers.
gotcha When using with sklearn pipelines, use `sklearndf.pipeline.PipelineDF` instead of `sklearn.pipeline.Pipeline` to preserve DataFrame features.
fix Replace `from sklearn.pipeline import Pipeline` with `from sklearndf.pipeline import PipelineDF`.
conda install -c conda-forge -c bcg_gamma sklearndf

Creates a DecisionTreeClassifier with sklearndf wrapper, fits on iris DataFrame, and prints feature names and predictions.

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearndf.classification import ClassifierDF

data = load_iris(as_frame=True)
X, y = data.data, data.target

clf = ClassifierDF(DecisionTreeClassifier())
clf.fit(X, y)
print(clf.feature_names_in_)
print(clf.predict(X[:5]))