UCI ML Repository Python Wrapper
The `ucimlrepo` library provides a simple interface to programmatically fetch and load datasets from the UC Irvine Machine Learning Repository directly into Python scripts and notebooks. It retrieves data, metadata, and variable information, primarily returning data as pandas DataFrames. As of version 0.0.7, it is actively maintained with a relatively stable but evolving API, ideal for machine learning and data science workflows.
Common errors
-
AttributeError: 'UCI_ML_Repo' object has no attribute 'data' (or 'features', 'targets')
cause Attempting to access attributes like `data`, `features`, or `targets` directly on the object returned by `fetch_ucirepo`.fixThe data attributes are nested. Access features as `dataset_obj.data.features` and targets as `dataset_obj.data.targets`. -
ModuleNotFoundError: No module named 'pandas'
cause `ucimlrepo` depends on `pandas` to return dataset features and targets as DataFrames, but `pandas` is not installed in the current Python environment.fixInstall the `pandas` library: `pip install pandas`. -
ValueError: No dataset found with id X
cause The provided dataset ID (X) does not correspond to an existing dataset in the UCI ML Repository, or there was a typo in the ID.fixVerify the dataset ID on the official UCI ML Repository website or consult `ucimlrepo.list_available_datasets()` if your version of the library supports listing available datasets to find the correct ID.
Warnings
- breaking The library is in early development (version 0.0.x). While efforts are made for stability, API changes, particularly in the structure of the returned dataset object attributes (e.g., `dataset.data`, `dataset.metadata`, `dataset.variables`), may occur without a major version increment.
- gotcha All fetched dataset features (`.data.features`) and targets (`.data.targets`) are consistently returned as `pandas.DataFrame` objects. Users should be prepared to work with pandas, or convert the data to NumPy arrays or other formats explicitly.
- gotcha Datasets are primarily accessed by their unique integer `id` (e.g., `fetch_ucirepo(id=53)`). While the repository offers names, programmatic access relies on these IDs. Incorrect IDs will result in an error.
Install
-
pip install ucimlrepo
Imports
- fetch_ucirepo
import ucimlrepo.fetch_ucirepo
from ucimlrepo import fetch_ucirepo
Quickstart
from ucimlrepo import fetch_ucirepo
# Fetch a dataset by its ID (e.g., Iris dataset, ID 53)
iris_dataset = fetch_ucirepo(id=53)
# Access features (X) and targets (y) as pandas DataFrames
X = iris_dataset.data.features
y = iris_dataset.data.targets
print("Features (X) head:\n", X.head())
print("Targets (y) head:\n", y.head())
# Access metadata and variable information
print("\nMetadata:\n", iris_dataset.metadata)
print("\nVariable Info:\n", iris_dataset.variables)