{"id":4573,"library":"imblearn","title":"imbalanced-learn","description":"imbalanced-learn is a Python library that provides a comprehensive suite of resampling techniques to address imbalanced datasets in machine learning, where one class significantly outnumbers another. It offers methods for over-sampling (e.g., SMOTE, ADASYN), under-sampling (e.g., NearMiss, EditedNearestNeighbours), and combined approaches, along with ensemble methods tailored for imbalanced data. It is compatible with scikit-learn and is part of scikit-learn-contrib projects. The current stable version is 0.14.1, with regular maintenance releases to ensure compatibility with scikit-learn and Python versions.","status":"active","version":"0.14.1","language":"en","source_language":"en","source_url":"https://github.com/scikit-learn-contrib/imbalanced-learn","tags":["machine-learning","imbalanced-data","resampling","oversampling","undersampling","smote","scikit-learn","data-preprocessing"],"install":[{"cmd":"pip install imbalanced-learn","lang":"bash","label":"PyPI"},{"cmd":"conda install -c conda-forge imbalanced-learn","lang":"bash","label":"Conda"}],"dependencies":[{"reason":"Runtime environment","package":"Python","minimum_version":"3.10"},{"reason":"Numerical operations","package":"NumPy","minimum_version":"1.25.2"},{"reason":"Scientific computing","package":"SciPy","minimum_version":"1.11.4"},{"reason":"Core machine learning algorithms and API compatibility","package":"Scikit-learn","minimum_version":"1.4.2"},{"reason":"For DataFrame input/output handling","package":"Pandas","optional":true,"minimum_version":"2.0.3"},{"reason":"For Keras/TensorFlow integratio","package":"Tensorflow","optional":true,"minimum_version":"2.16.1"},{"reason":"For Keras/TensorFlow integration","package":"Keras","optional":true,"minimum_version":"3.3.3"}],"imports":[{"symbol":"imblearn","correct":"import imblearn"},{"symbol":"SMOTE","correct":"from imblearn.over_sampling import SMOTE"},{"symbol":"RandomOverSampler","correct":"from imblearn.over_sampling import RandomOverSampler"},{"symbol":"RandomUnderSampler","correct":"from imblearn.under_sampling import RandomUnderSampler"},{"symbol":"Pipeline","correct":"from imblearn.pipeline import Pipeline"},{"note":"A common mistake is trying to 'pip install imblearn' directly, which is an old, empty PyPI package. The correct package name for installation is 'imbalanced-learn', though it's imported as 'imblearn'.","wrong":"import imblearn","symbol":"imblearn"}],"quickstart":{"code":"from sklearn.datasets import make_classification\nfrom collections import Counter\nfrom imblearn.over_sampling import SMOTE\n\n# Generate an imbalanced dataset\nX, y = make_classification(\n    n_samples=1000, n_features=2, n_informative=2, n_redundant=0,\n    n_repeated=0, n_classes=2, n_clusters_per_class=1,\n    weights=[0.9, 0.1], flip_y=0, random_state=42\n)\n\nprint(f\"Original dataset shape: {Counter(y)}\")\n\n# Apply SMOTE to oversample the minority class\nsmote = SMOTE(random_state=42)\nX_resampled, y_resampled = smote.fit_resample(X, y)\n\nprint(f\"Resampled dataset shape: {Counter(y_resampled)}\")","lang":"python","description":"This quickstart demonstrates how to use SMOTE (Synthetic Minority Over-sampling Technique) to balance an imbalanced dataset generated with scikit-learn. It first creates a dataset with a 90:10 class distribution and then applies SMOTE to equalize the number of samples in both classes."},"warnings":[{"fix":"Replace `ratio` and `return_indices` with `sampling_strategy` (e.g., `sampling_strategy='auto'`, `'minority'`, `'not majority'`, or a dictionary specifying target counts).","message":"The `ratio` and `return_indices` parameters have been removed from all samplers. Users should now use the `sampling_strategy` parameter instead. This can break code written for older versions.","severity":"breaking","affected_versions":"< 0.6.0"},{"fix":"Migrate to alternative ensemble methods or updated API for similar functionality.","message":"The `imblearn.ensemble.BalanceCascade` and `imblearn.ensemble.EasyEnsemble` classes were removed after two deprecation cycles. Code relying on these will fail.","severity":"breaking","affected_versions":">= 0.7.0"},{"fix":"Avoid using the `n_jobs` parameter in `ClusterCentroids` and rely on scikit-learn's global joblib configuration or alternative parallelization strategies if needed.","message":"The `n_jobs` parameter in `imblearn.under_sampling.ClusterCentroids` is deprecated due to its deprecation in `sklearn.cluster.KMeans` (which it relies on).","severity":"deprecated","affected_versions":">= 0.9.0"},{"fix":"Ensure your pipeline is properly fitted before calling methods like `transform` or `predict` to avoid future errors. This often means calling `.fit()` before any other operations on the pipeline.","message":"The `imblearn.pipeline.Pipeline`'s internal `check_is_fitted` mechanism for checking if the pipeline is fitted will change from a warning to an error in version 0.15. This change aligns with scikit-learn's behavior.","severity":"gotcha","affected_versions":"0.14.x"},{"fix":"Always check the official documentation for the precise `scikit-learn` version compatibility for your `imbalanced-learn` version. Upgrade both packages concurrently if issues arise.","message":"imbalanced-learn has strong compatibility requirements with specific `scikit-learn` versions. Installing incompatible versions can lead to `FutureWarning`s or runtime errors.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}