OpenML Python API

raw JSON →
0.15.1 verified Fri May 01 auth: no python

OpenML Python API for downloading, uploading, and managing datasets, tasks, runs, and flows on OpenML.org. Current version 0.15.1, requires Python >=3.8. Released under BSD license. Active development with periodic releases.

pip install openml
error openml.exceptions.OpenMLServerError: The request has not been authorized (401)
cause Missing or invalid OpenML API key.
fix
Set API key via environment variable OPENML_API_KEY or openml.config.apikey = '...'
error AttributeError: 'int' object has no attribute 'get' (when calling get_dataset with wrong ID)
cause Dataset ID does not exist or is not an integer. Function expects integer ID but may receive something else.
fix
Ensure dataset ID is valid integer. Use list_datasets() to check existing IDs.
error ModuleNotFoundError: No module named 'openml'
cause OpenML not installed or installed in different environment.
fix
Run pip install openml and ensure correct Python environment is active.
breaking In version 0.14, the `get_dataset` method changed from returning a tuple to returning a `OpenMLDataset` object. Calling `get_data()` now returns four values (X, y, categorical_indicator, attribute_names) instead of previously different structure.
fix Update code to use `X, y, categorical, names = dataset.get_data(target=...)`.
deprecated `openml.datasets.list_datasets()` is deprecated; use `openml.datasets.get_dataset_list()` or pass parameters to `list_datasets()`.
fix Replace `list_datasets()` with `get_dataset_list()` or use `list_datasets(size=...)` etc.
gotcha API key configuration requires either setting OPENML_API_KEY env var or creating a ~/.openml/config file. Without it, many functions raise OpenMLServerError (401).
fix Set env var or file: `openml.config.apikey = 'your_apikey'`. Get key from openml.org.

Setup OpenML, list datasets, download and inspect the Iris dataset.

import openml
openml.config.apikey = openml.config.get_api_key()  # uses ~/.openml/config or environment
# List datasets
datasets = openml.datasets.list_datasets()
print(f"Number of datasets: {len(datasets)}")
# Download a dataset (iris)
dataset = openml.datasets.get_dataset(61)
X, y, categorical_names, attribute_names = dataset.get_data(target=dataset.default_target_attribute)
print(X.head())