imbalanced-learn

0.14.1 · active · verified Sun Apr 12

imbalanced-learn is a Python library that provides a comprehensive suite of resampling techniques to address imbalanced datasets in machine learning, where one class significantly outnumbers another. It offers methods for over-sampling (e.g., SMOTE, ADASYN), under-sampling (e.g., NearMiss, EditedNearestNeighbours), and combined approaches, along with ensemble methods tailored for imbalanced data. It is compatible with scikit-learn and is part of scikit-learn-contrib projects. The current stable version is 0.14.1, with regular maintenance releases to ensure compatibility with scikit-learn and Python versions.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use SMOTE (Synthetic Minority Over-sampling Technique) to balance an imbalanced dataset generated with scikit-learn. It first creates a dataset with a 90:10 class distribution and then applies SMOTE to equalize the number of samples in both classes.

from sklearn.datasets import make_classification
from collections import Counter
from imblearn.over_sampling import SMOTE

# Generate an imbalanced dataset
X, y = make_classification(
    n_samples=1000, n_features=2, n_informative=2, n_redundant=0,
    n_repeated=0, n_classes=2, n_clusters_per_class=1,
    weights=[0.9, 0.1], flip_y=0, random_state=42
)

print(f"Original dataset shape: {Counter(y)}")

# Apply SMOTE to oversample the minority class
smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X, y)

print(f"Resampled dataset shape: {Counter(y_resampled)}")

view raw JSON →