scikit-datasets

raw JSON →
0.2.5 verified Fri May 01 auth: no python

scikit-datasets provides a collection of ready-to-use datasets compatible with scikit-learn, wrapping common benchmarks like MNIST, CIFAR, and more. Version 0.2.5, updated irregularly.

pip install scikit-datasets
error ModuleNotFoundError: No module named 'scikit_datasets'
cause The import package was renamed to 'skdata' in version 0.2.5, but older tutorials use 'scikit_datasets'.
fix
Use 'from skdata import load_...' or downgrade to version 0.1.x (not recommended).
error AttributeError: module 'skdata' has no attribute 'datasets'
cause Users try to access datasets via a submodule that does not exist; functions are directly on the skdata module.
fix
Use 'from skdata import load_mnist' directly, not 'skdata.datasets.load_mnist'.
breaking In version 0.2.5, the package name changed from 'scikit_datasets' to 'skdata' for imports. Code using 'from scikit_datasets import ...' will break.
fix Change import to 'from skdata import ...'.
deprecated The 'as_frames' parameter for returning pandas DataFrames is deprecated in favor of 'as_frame=True', which will become the default in future versions.
fix Use 'as_frame=True' instead of 'as_frames=True'.
gotcha Dataset downloads can be large (e.g., MNIST ~15MB, CIFAR-10 ~170MB) and are cached in ~/skdata_data/. Not all datasets are bundled; some require an internet connection on first load.
fix Ensure internet access for first-time download, or cache the data directory.

Loads MNIST and fits an SVM classifier.

from skdata import load_mnist
from sklearn.svm import SVC

data = load_mnist()
X, y = data.data, data.target
clf = SVC().fit(X, y)
print(clf.score(X, y))