scikit-learn
Standard Python machine learning library. Consistent fit/predict/transform API across estimators. Current version is 1.8.0 (Dec 2025). Requires Python >=3.11. Import name is sklearn (not scikit_learn). The sklearn PyPI package is deprecated — install with pip install scikit-learn.
Warnings
- breaking pip install sklearn now raises an error (since Dec 1, 2023). The sklearn PyPI package was deprecated and blocked. LLMs still generate pip install sklearn in requirements.txt and setup.py.
- breaking Pickled models are not guaranteed compatible across scikit-learn versions. Loading a model pickled with 0.21 into 1.x raises ValueError or produces silent wrong results.
- breaking Python 3.10 support dropped in 1.8. scikit-learn 1.8 requires Python >=3.11.
- breaking n_features_ attribute removed from most estimators. Replaced by n_features_in_ (added in 1.0). Code accessing clf.n_features_ raises AttributeError.
- gotcha Data leakage via manual transform: calling scaler.fit_transform(X_train) then scaler.transform(X_test) is correct, but calling scaler.fit_transform(X) before the split leaks test statistics into training. LLMs frequently generate this pattern.
- gotcha random_state must be set on both train_test_split AND estimators for reproducible results. Setting only one still produces variation.
- gotcha pandas 3.0 string dtype compatibility: scikit-learn may fail or warn when receiving string-dtype columns (new default in pandas 3.0). Encode categoricals explicitly before fitting.
Install
-
pip install scikit-learn -
pip install sklearn
Imports
- sklearn
from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler
- Pipeline
from sklearn.pipeline import Pipeline pipe = Pipeline([ ('scaler', StandardScaler()), ('clf', LogisticRegression()) ]) pipe.fit(X_train, y_train)
Quickstart
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
pipe = Pipeline([
('scaler', StandardScaler()),
('clf', LogisticRegression(random_state=42))
])
pipe.fit(X_train, y_train)
print(accuracy_score(y_test, pipe.predict(X_test)))