Iterative Stratification
raw JSON → 0.1.9 verified Fri May 01 auth: no python
Provides scikit-learn compatible cross-validators with stratification for multilabel data. Current version 0.1.9, release cadence is sporadic.
pip install iterative-stratification Common errors
error ModuleNotFoundError: No module named 'iterative_stratification' ↓
cause Incorrect import name; the actual module is 'iterstrat'.
fix
Change imports to 'from iterstrat.ml_stratifiers import MultilabelStratifiedKFold'
error ValueError: The number of classes has to be greater than one ↓
cause y provided as a 1d array or single-label; iterative stratification requires multilabel binary matrix.
fix
Convert labels to binary indicator matrix using sklearn.preprocessing.MultiLabelBinarizer.
error DeprecationWarning: Passing 'n_splits' without 'n_splits' as a keyword argument is deprecated ↓
cause Using older version (<0.1.7) with scikit-learn 1.0+.
fix
Upgrade to iterstrat >=0.1.7.
Warnings
deprecated scikit-learn 1.0 introduced extra parameter warnings; version 0.1.7+ handles these but if using older version, expect deprecation warnings. ↓
fix Upgrade to iterstrat 0.1.7 or higher.
gotcha The package is imported as 'iterstrat', not 'iterative-stratification' or 'iterative_stratification'. Many users mistakenly use the PyPI name. ↓
fix Use 'from iterstrat.ml_stratifiers import ...'
gotcha The cross-validator expects y to be a binary indicator matrix (2d array of 0/1), not label indices. Use MultiLabelBinarizer from sklearn.preprocessing to convert. ↓
fix Ensure y is shape (n_samples, n_labels) with values 0 or 1.
Imports
- MultilabelStratifiedKFold wrong
from iterative_stratification import ...correctfrom iterstrat.ml_stratifiers import MultilabelStratifiedKFold - MultilabelStratifiedShuffleSplit
from iterstrat.ml_stratifiers import MultilabelStratifiedShuffleSplit
Quickstart
import numpy as np
from iterstrat.ml_stratifiers import MultilabelStratifiedKFold
X = np.random.rand(20, 5)
y = np.random.randint(0, 2, (20, 3)) # multilabel binary indicator matrix
kf = MultilabelStratifiedKFold(n_splits=5, shuffle=True, random_state=42)
for train_index, test_index in kf.split(X, y):
print("Train:", train_index, "Test:", test_index)