imbalanced-learn
imbalanced-learn is a Python library that provides a comprehensive suite of resampling techniques to address imbalanced datasets in machine learning, where one class significantly outnumbers another. It offers methods for over-sampling (e.g., SMOTE, ADASYN), under-sampling (e.g., NearMiss, EditedNearestNeighbours), and combined approaches, along with ensemble methods tailored for imbalanced data. It is compatible with scikit-learn and is part of scikit-learn-contrib projects. The current stable version is 0.14.1, with regular maintenance releases to ensure compatibility with scikit-learn and Python versions.
Warnings
- breaking The `ratio` and `return_indices` parameters have been removed from all samplers. Users should now use the `sampling_strategy` parameter instead. This can break code written for older versions.
- breaking The `imblearn.ensemble.BalanceCascade` and `imblearn.ensemble.EasyEnsemble` classes were removed after two deprecation cycles. Code relying on these will fail.
- deprecated The `n_jobs` parameter in `imblearn.under_sampling.ClusterCentroids` is deprecated due to its deprecation in `sklearn.cluster.KMeans` (which it relies on).
- gotcha The `imblearn.pipeline.Pipeline`'s internal `check_is_fitted` mechanism for checking if the pipeline is fitted will change from a warning to an error in version 0.15. This change aligns with scikit-learn's behavior.
- gotcha imbalanced-learn has strong compatibility requirements with specific `scikit-learn` versions. Installing incompatible versions can lead to `FutureWarning`s or runtime errors.
Install
-
pip install imbalanced-learn -
conda install -c conda-forge imbalanced-learn
Imports
- imblearn
import imblearn
- SMOTE
from imblearn.over_sampling import SMOTE
- RandomOverSampler
from imblearn.over_sampling import RandomOverSampler
- RandomUnderSampler
from imblearn.under_sampling import RandomUnderSampler
- Pipeline
from imblearn.pipeline import Pipeline
- imblearn
Quickstart
from sklearn.datasets import make_classification
from collections import Counter
from imblearn.over_sampling import SMOTE
# Generate an imbalanced dataset
X, y = make_classification(
n_samples=1000, n_features=2, n_informative=2, n_redundant=0,
n_repeated=0, n_classes=2, n_clusters_per_class=1,
weights=[0.9, 0.1], flip_y=0, random_state=42
)
print(f"Original dataset shape: {Counter(y)}")
# Apply SMOTE to oversample the minority class
smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X, y)
print(f"Resampled dataset shape: {Counter(y_resampled)}")