Imbalanced-learn
A Python library for handling imbalanced datasets in machine learning, currently at version 0.14.1, with a release cadence of approximately every 6 months.
Common errors
-
ModuleNotFoundError: No module named 'imblearn'
cause The 'imbalanced-learn' library is not installed in the Python environment being used, or the environment is not correctly activated.fixInstall the package using pip: `pip install imbalanced-learn` or with conda: `conda install -c conda-forge imbalanced-learn`. -
AttributeError: 'SMOTE' object has no attribute 'fit_sample'
cause The method name `fit_sample` was deprecated and renamed to `fit_resample` in newer versions of imbalanced-learn (specifically, `fit_sample` was removed in version 0.8, having been aliased since 0.4).fixReplace `fit_sample()` with `fit_resample()` for all resampling objects, e.g., `X_resampled, y_resampled = SMOTE().fit_resample(X, y)`. -
AttributeError: module 'imblearn' has no attribute 'ensemble'
cause When using `import imblearn`, the submodules (like `ensemble`, `over_sampling`, `under_sampling`) are not automatically imported into the top-level `imblearn` namespace.fixExplicitly import the desired submodule or class, for example, `from imblearn import ensemble` or `from imblearn.ensemble import EasyEnsembleClassifier`.
Warnings
- gotcha Compatibility issues with scikit-learn 1.6.0 and sklearn-compat 0.1.3 in imbalanced-learn 0.13.0.
- gotcha ImportError: cannot import name 'MultiOutputMixin' from 'sklearn.base'.
- gotcha ImportError: cannot import name 'validate_data' from 'sklearn.utils.validation'.
- breaking Scikit-learn, a core dependency of imbalanced-learn, requires a C/C++ compiler for successful installation, which is typically not pre-installed in minimal base images like Alpine Linux. This results in 'Unknown compiler(s)' errors during metadata preparation.
Install
-
pip install imbalanced-learn
Imports
- RandomOverSampler
from imblearn.over_sampling import RandomOverSampler
Quickstart
import numpy as np
from imblearn.over_sampling import RandomOverSampler
# Sample data
X = np.array([[1, 2], [1, 3], [2, 3], [3, 4], [5, 6], [7, 8], [8, 9], [8, 10], [9, 10], [10, 11]])
y = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
# Initialize RandomOverSampler
ros = RandomOverSampler(random_state=42)
# Fit and resample
X_res, y_res = ros.fit_resample(X, y)
print(f'Resampled dataset shape: {np.bincount(y_res)}')