Scikit-learn (sklearn)
raw JSON → 1.8.0 verified Tue May 12 auth: no python install: draft quickstart: draft
Scikit-learn is a free and open-source machine learning library for Python, built on NumPy and SciPy. It provides a wide range of efficient tools for predictive data analysis, including algorithms for classification, regression, clustering, dimensionality reduction, and model selection. Known for its consistent API and comprehensive documentation, it is actively maintained with a regular release cadence. The latest stable version is 1.8.0.
pip install -U scikit-learn Common errors
error ModuleNotFoundError: No module named 'sklearn' ↓
cause The scikit-learn library is not installed in your Python environment, or your environment is not correctly activated.
fix
Run
pip install scikit-learn in your terminal to install the library. error ValueError: Expected 2D array, got 1D array instead: ↓
cause Scikit-learn estimators expect input data to be a 2D array (samples, features), even for a single sample or a single feature, but a 1D array was provided.
fix
Reshape your 1D array into a 2D array using
array.reshape(-1, 1) for a single feature vector or array.reshape(1, -1) for a single sample. error NotFittedError: This XXXX instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator. ↓
cause You are attempting to use a method like `predict`, `transform`, or `score` on a scikit-learn estimator before it has been trained by calling its `fit` method.
fix
Train the estimator first by calling
model.fit(X_train, y_train) before attempting to make predictions or transformations. error AttributeError: type object 'LinearRegression' has no attribute 'fit' ↓
cause You are attempting to call the `fit` method directly on the scikit-learn model class (e.g., `LinearRegression`) instead of on an instantiated object of that class.
fix
First, create an instance of the model (e.g.,
model = LinearRegression()), then call model.fit(X, y) on the instance. Warnings
breaking Do NOT install 'sklearn' from PyPI. The 'sklearn' PyPI package is a deprecated placeholder and will lead to errors or install an outdated/dummy package. Always install the library using 'pip install scikit-learn'. ↓
fix Use `pip install scikit-learn` for installation. If you have `sklearn` installed, uninstall it with `pip uninstall sklearn` and then `pip install scikit-learn`. If a dependency requires `sklearn`, report it to their issue tracker or set `SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True` as a last resort.
breaking Positional arguments for estimator instantiation and method calls are deprecated since version 0.23 and now raise a TypeError in Scikit-learn 1.0 and later for most parameters. ↓
fix Always use keyword arguments when instantiating estimators or calling methods with multiple parameters. For example, use `RandomForestClassifier(n_estimators=100)` instead of `RandomForestClassifier(100)`.
deprecated The `get_feature_names` method on transformers is deprecated. ↓
fix Use `get_feature_names_out` instead to retrieve the names of output features from a transformer.
gotcha Usage of `numpy.matrix` as input to Scikit-learn estimators is deprecated. ↓
fix Convert `numpy.matrix` inputs to `numpy.ndarray` (e.g., using `.A` attribute or `np.asarray()`) before passing them to Scikit-learn estimators.
gotcha Scikit-learn 1.0+ stores feature names in `feature_names_in_` when fitted on pandas DataFrames. Inconsistent feature names during subsequent `transform` (or other non-fit methods) will raise a `FutureWarning` which will become a `ValueError` in version 1.2. ↓
fix Ensure that the feature names (column names of pandas DataFrames) are consistent between `fit` and subsequent operations (`transform`, `predict`). If feature names are not important, consider converting DataFrames to NumPy arrays (e.g., `df.values`) before passing them to estimators.
Install
conda install -c conda-forge scikit-learn Install compatibility draft last tested: 2026-05-12
python os / libc variant status wheel install import disk
3.10 alpine (musl) -U - - - -
3.10 alpine (musl) scikit-learn - - - -
3.10 alpine (musl) sklearn - - - -
3.10 slim (glibc) -U - - 2.08s 270M
3.10 slim (glibc) scikit-learn - - 1.90s 270M
3.10 slim (glibc) sklearn - - - -
3.11 alpine (musl) -U - - - -
3.11 alpine (musl) scikit-learn - - - -
3.11 alpine (musl) sklearn - - - -
3.11 slim (glibc) -U - - 3.75s 287M
3.11 slim (glibc) scikit-learn - - 3.63s 287M
3.11 slim (glibc) sklearn - - - -
3.12 alpine (musl) -U - - - -
3.12 alpine (musl) scikit-learn - - - -
3.12 alpine (musl) sklearn - - - -
3.12 slim (glibc) -U - - 4.23s 271M
3.12 slim (glibc) scikit-learn - - 4.39s 271M
3.12 slim (glibc) sklearn - - - -
3.13 alpine (musl) -U - - - -
3.13 alpine (musl) scikit-learn - - - -
3.13 alpine (musl) sklearn - - - -
3.13 slim (glibc) -U - - 3.89s 269M
3.13 slim (glibc) scikit-learn - - 4.13s 269M
3.13 slim (glibc) sklearn - - - -
3.9 alpine (musl) -U - - - -
3.9 alpine (musl) scikit-learn - - - -
3.9 alpine (musl) sklearn - - - -
3.9 slim (glibc) -U - - 2.11s 284M
3.9 slim (glibc) scikit-learn - - 1.91s 284M
3.9 slim (glibc) sklearn - - - -
Imports
- sklearn wrong
from sklearn import ClassName (after pip install sklearn)correctimport sklearn - RandomForestClassifier
from sklearn.ensemble import RandomForestClassifier - train_test_split
from sklearn.model_selection import train_test_split
Quickstart draft last tested: 2026-04-23
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# 1. Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=4, random_state=42)
# 2. Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 3. Instantiate a classifier (estimator)
# Use keyword arguments for parameters, as positional arguments are deprecated (sklearn >= 1.0)
clf = RandomForestClassifier(n_estimators=100, random_state=42)
# 4. Fit the classifier to the training data
clf.fit(X_train, y_train)
# 5. Make predictions on the test data
y_pred = clf.predict(X_test)
# 6. Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")
# Example of using a preprocessor (e.g., StandardScaler in a pipeline context)
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
pipe = make_pipeline(StandardScaler(), LogisticRegression(random_state=42))
pipe.fit(X_train, y_train)
pipeline_accuracy = accuracy_score(pipe.predict(X_test), y_test)
print(f"Pipeline Accuracy: {pipeline_accuracy:.2f}")