Iterative Stratification

raw JSON →
0.1.9 verified Fri May 01 auth: no python

Provides scikit-learn compatible cross-validators with stratification for multilabel data. Current version 0.1.9, release cadence is sporadic.

pip install iterative-stratification
error ModuleNotFoundError: No module named 'iterative_stratification'
cause Incorrect import name; the actual module is 'iterstrat'.
fix
Change imports to 'from iterstrat.ml_stratifiers import MultilabelStratifiedKFold'
error ValueError: The number of classes has to be greater than one
cause y provided as a 1d array or single-label; iterative stratification requires multilabel binary matrix.
fix
Convert labels to binary indicator matrix using sklearn.preprocessing.MultiLabelBinarizer.
error DeprecationWarning: Passing 'n_splits' without 'n_splits' as a keyword argument is deprecated
cause Using older version (<0.1.7) with scikit-learn 1.0+.
fix
Upgrade to iterstrat >=0.1.7.
deprecated scikit-learn 1.0 introduced extra parameter warnings; version 0.1.7+ handles these but if using older version, expect deprecation warnings.
fix Upgrade to iterstrat 0.1.7 or higher.
gotcha The package is imported as 'iterstrat', not 'iterative-stratification' or 'iterative_stratification'. Many users mistakenly use the PyPI name.
fix Use 'from iterstrat.ml_stratifiers import ...'
gotcha The cross-validator expects y to be a binary indicator matrix (2d array of 0/1), not label indices. Use MultiLabelBinarizer from sklearn.preprocessing to convert.
fix Ensure y is shape (n_samples, n_labels) with values 0 or 1.

Simple usage of MultilabelStratifiedKFold for multilabel classification.

import numpy as np
from iterstrat.ml_stratifiers import MultilabelStratifiedKFold

X = np.random.rand(20, 5)
y = np.random.randint(0, 2, (20, 3))  # multilabel binary indicator matrix

kf = MultilabelStratifiedKFold(n_splits=5, shuffle=True, random_state=42)
for train_index, test_index in kf.split(X, y):
    print("Train:", train_index, "Test:", test_index)