Scikit-learn (sklearn)

1.8.0 verified Tue May 12 auth: no python install: draft quickstart: draft

Scikit-learn is a free and open-source machine learning library for Python, built on NumPy and SciPy. It provides a wide range of efficient tools for predictive data analysis, including algorithms for classification, regression, clustering, dimensionality reduction, and model selection. Known for its consistent API and comprehensive documentation, it is actively maintained with a regular release cadence. The latest stable version is 1.8.0.

pip install -U scikit-learn

Common errors

error ModuleNotFoundError: No module named 'sklearn' ↓

cause The scikit-learn library is not installed in your Python environment, or your environment is not correctly activated.

fix

Run pip install scikit-learn in your terminal to install the library.

error ValueError: Expected 2D array, got 1D array instead: ↓

cause Scikit-learn estimators expect input data to be a 2D array (samples, features), even for a single sample or a single feature, but a 1D array was provided.

fix

Reshape your 1D array into a 2D array using array.reshape(-1, 1) for a single feature vector or array.reshape(1, -1) for a single sample.

error NotFittedError: This XXXX instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator. ↓

cause You are attempting to use a method like `predict`, `transform`, or `score` on a scikit-learn estimator before it has been trained by calling its `fit` method.

fix

Train the estimator first by calling model.fit(X_train, y_train) before attempting to make predictions or transformations.

error AttributeError: type object 'LinearRegression' has no attribute 'fit' ↓

cause You are attempting to call the `fit` method directly on the scikit-learn model class (e.g., `LinearRegression`) instead of on an instantiated object of that class.

fix

First, create an instance of the model (e.g., model = LinearRegression()), then call model.fit(X, y) on the instance.

Warnings

breaking Do NOT install 'sklearn' from PyPI. The 'sklearn' PyPI package is a deprecated placeholder and will lead to errors or install an outdated/dummy package. Always install the library using 'pip install scikit-learn'. ↓

fix Use `pip install scikit-learn` for installation. If you have `sklearn` installed, uninstall it with `pip uninstall sklearn` and then `pip install scikit-learn`. If a dependency requires `sklearn`, report it to their issue tracker or set `SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True` as a last resort.

breaking Positional arguments for estimator instantiation and method calls are deprecated since version 0.23 and now raise a TypeError in Scikit-learn 1.0 and later for most parameters. ↓

fix Always use keyword arguments when instantiating estimators or calling methods with multiple parameters. For example, use `RandomForestClassifier(n_estimators=100)` instead of `RandomForestClassifier(100)`.

deprecated The `get_feature_names` method on transformers is deprecated. ↓

fix Use `get_feature_names_out` instead to retrieve the names of output features from a transformer.

gotcha Usage of `numpy.matrix` as input to Scikit-learn estimators is deprecated. ↓

fix Convert `numpy.matrix` inputs to `numpy.ndarray` (e.g., using `.A` attribute or `np.asarray()`) before passing them to Scikit-learn estimators.

gotcha Scikit-learn 1.0+ stores feature names in `feature_names_in_` when fitted on pandas DataFrames. Inconsistent feature names during subsequent `transform` (or other non-fit methods) will raise a `FutureWarning` which will become a `ValueError` in version 1.2. ↓

fix Ensure that the feature names (column names of pandas DataFrames) are consistent between `fit` and subsequent operations (`transform`, `predict`). If feature names are not important, consider converting DataFrames to NumPy arrays (e.g., `df.values`) before passing them to estimators.

Install

conda install -c conda-forge scikit-learn

Install compatibility draft last tested: 2026-05-12

python os / libc variant status wheel install import disk

3.10 alpine (musl) -U - - - -

3.10 alpine (musl) scikit-learn - - - -

3.10 alpine (musl) sklearn - - - -

3.10 slim (glibc) -U - - 2.08s 270M

3.10 slim (glibc) scikit-learn - - 1.90s 270M

3.10 slim (glibc) sklearn - - - -

3.11 alpine (musl) -U - - - -

3.11 alpine (musl) scikit-learn - - - -

3.11 alpine (musl) sklearn - - - -

3.11 slim (glibc) -U - - 3.75s 287M

3.11 slim (glibc) scikit-learn - - 3.63s 287M

3.11 slim (glibc) sklearn - - - -

3.12 alpine (musl) -U - - - -

3.12 alpine (musl) scikit-learn - - - -

3.12 alpine (musl) sklearn - - - -

3.12 slim (glibc) -U - - 4.23s 271M

3.12 slim (glibc) scikit-learn - - 4.39s 271M

3.12 slim (glibc) sklearn - - - -

3.13 alpine (musl) -U - - - -

3.13 alpine (musl) scikit-learn - - - -

3.13 alpine (musl) sklearn - - - -

3.13 slim (glibc) -U - - 3.89s 269M

3.13 slim (glibc) scikit-learn - - 4.13s 269M

3.13 slim (glibc) sklearn - - - -

3.9 alpine (musl) -U - - - -

3.9 alpine (musl) scikit-learn - - - -

3.9 alpine (musl) sklearn - - - -

3.9 slim (glibc) -U - - 2.11s 284M

3.9 slim (glibc) scikit-learn - - 1.91s 284M

3.9 slim (glibc) sklearn - - - -

Imports

sklearn
wrong
```
from sklearn import ClassName (after pip install sklearn)
```
correct
```
import sklearn
```
While 'import sklearn' is the correct module name, the PyPI package to install is 'scikit-learn'. Installing 'sklearn' from PyPI will install a deprecated placeholder package (version 0.0.x) that is not the actual scikit-learn library and will raise warnings or errors. Always use 'pip install scikit-learn'.
RandomForestClassifier
```
from sklearn.ensemble import RandomForestClassifier
```
Imports for specific estimators or utilities are typically from submodules like `sklearn.ensemble`, `sklearn.linear_model`, `sklearn.preprocessing`, etc.
train_test_split
```
from sklearn.model_selection import train_test_split
```
Model selection tools are found in `sklearn.model_selection`.

Quickstart draft last tested: 2026-04-23

This quickstart demonstrates a typical Scikit-learn workflow: generating data, splitting it into training and testing sets, training a `RandomForestClassifier` with keyword arguments, making predictions, and evaluating the model. It also shows a simple `Pipeline` combining a preprocessor and a classifier.

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# 1. Generate synthetic data
X, y = make_classification(n_samples=1000, n_features=4, random_state=42)

# 2. Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 3. Instantiate a classifier (estimator)
# Use keyword arguments for parameters, as positional arguments are deprecated (sklearn >= 1.0)
clf = RandomForestClassifier(n_estimators=100, random_state=42)

# 4. Fit the classifier to the training data
clf.fit(X_train, y_train)

# 5. Make predictions on the test data
y_pred = clf.predict(X_test)

# 6. Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")

# Example of using a preprocessor (e.g., StandardScaler in a pipeline context)
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression

pipe = make_pipeline(StandardScaler(), LogisticRegression(random_state=42))
pipe.fit(X_train, y_train)
pipeline_accuracy = accuracy_score(pipe.predict(X_test), y_test)
print(f"Pipeline Accuracy: {pipeline_accuracy:.2f}")