Imbalanced-learn
A Python library for handling imbalanced datasets in machine learning, currently at version 0.14.1, with a release cadence of approximately every 6 months.
Warnings
- breaking Compatibility issues with scikit-learn 1.6.0 and sklearn-compat 0.1.3 in imbalanced-learn 0.13.0.
- gotcha ImportError: cannot import name 'MultiOutputMixin' from 'sklearn.base'.
- gotcha ImportError: cannot import name 'validate_data' from 'sklearn.utils.validation'.
Install
-
pip install imbalanced-learn
Imports
- RandomOverSampler
from imblearn.over_sampling import RandomOverSampler
Quickstart
import numpy as np
from imblearn.over_sampling import RandomOverSampler
# Sample data
X = np.array([[1, 2], [1, 3], [2, 3], [3, 4], [5, 6], [7, 8], [8, 9], [8, 10], [9, 10], [10, 11]])
y = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
# Initialize RandomOverSampler
ros = RandomOverSampler(random_state=42)
# Fit and resample
X_res, y_res = ros.fit_resample(X, y)
print(f'Resampled dataset shape: {np.bincount(y_res)}')