sklearndf
raw JSON → 2.4.2 verified Mon Apr 27 auth: no python
Data frame support and feature traceability for scikit-learn. Version 2.4.2 supports Python >=3.9, <4 and integrates with scikit-learn 1.6+. Release cadence is irregular, with recent minor versions every few months.
pip install sklearndf Common errors
error ImportError: cannot import name 'EstimatorWrapperDF' from 'sklearndf' ↓
cause EstimatorWrapperDF is not in the top-level sklearndf module.
fix
Use: from sklearndf.transformation import EstimatorWrapperDF
error AttributeError: 'EstimatorWrapperDF' object has no attribute 'feature_names_in_' ↓
cause The estimator has not been fitted or does not expose feature_names_in_ (requires scikit-learn >=1.0).
fix
Ensure the estimator is fitted and scikit-learn version is >=1.0.
error TypeError: All intermediate steps should be transformers or implement fit_transform. 'ClassifierDF' doesn't ↓
cause Using a classifier (ClassifierDF) inside a PipelineDF as an intermediate step instead of a transformer.
fix
Place classifiers only at the end of the pipeline.
error ValueError: could not convert string to float: '...' ↓
cause DataFrame contains non-numeric columns; sklearndf wrappers expect numeric input.
fix
Preprocess categorical features using OneHotEncoder or similar transformer before passing to estimator.
Warnings
breaking sklearndf 2.4.0 introduced scikit-learn 1.7 support. Older scikit-learn versions may break with 2.4.0+. ↓
fix Upgrade scikit-learn to >=1.6 (or 1.7 for full support).
gotcha Wrapper classes only accept scikit-learn estimators that support DataFrame output. Not all sklearn estimators do; check sklearn documentation. ↓
fix Use only estimators that are known to return DataFrames (e.g., from sklearn.ensemble, sklearn.linear_model).
deprecated The pipeline module's `FitDataFrame` and `TransformDataFrame` are deprecated in favor of `EstimatorWrapperDF` and `TransformerWrapperDF`. ↓
fix Use `sklearndf.transformation.EstimatorWrapperDF` for estimators and `TransformerWrapperDF` for transformers.
gotcha When using with sklearn pipelines, use `sklearndf.pipeline.PipelineDF` instead of `sklearn.pipeline.Pipeline` to preserve DataFrame features. ↓
fix Replace `from sklearn.pipeline import Pipeline` with `from sklearndf.pipeline import PipelineDF`.
Install
conda install -c conda-forge -c bcg_gamma sklearndf Imports
- EstimatorWrapperDF wrong
from sklearndf import EstimatorWrapperDFcorrectfrom sklearndf.transformation import EstimatorWrapperDF - ClassifierDF wrong
from sklearn import ...correctfrom sklearndf.classification import ClassifierDF
Quickstart
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearndf.classification import ClassifierDF
data = load_iris(as_frame=True)
X, y = data.data, data.target
clf = ClassifierDF(DecisionTreeClassifier())
clf.fit(X, y)
print(clf.feature_names_in_)
print(clf.predict(X[:5]))