AutoGluon
AutoGluon is an open-source AutoML library developed by AWS AI, designed for fast and accurate machine learning with minimal code. It automates model selection, hyperparameter tuning, and ensemble creation across various data types, including tabular, time series, and multimodal data. The current version is 1.5.0, with a rapid release cadence, often introducing significant performance improvements, new models, and expanded functionalities.
Warnings
- breaking Models trained with an older version of AutoGluon are not guaranteed to be compatible with newer versions. Users must re-train models after upgrading AutoGluon to prevent issues.
- breaking Python 3.8 support was dropped in AutoGluon v1.2.0. Additionally, Python 3.13 support is currently experimental in v1.5.0, with potential limitations on Windows.
- breaking Several `TabularPredictor` methods were deprecated in v1.0.0 (e.g., `persist_models`, `get_model_names`), began raising errors in v1.2.0, and were completely removed in v1.3.0.
- deprecated The default behavior of `TabularPredictor.delete_models()` will change from `dry_run=True` to `dry_run=False` in a future release. A `FutureWarning` is logged since v1.3.0.
- gotcha AutoGluon (especially `TabularPredictor`) can be memory-intensive, particularly with large datasets or complex ensembles. Out-of-memory errors are common.
- gotcha On macOS, LightGBM and XGBoost (used by AutoGluon) can experience segmentation faults or instability if `libomp` is installed via `brew install libomp`.
Install
-
pip install autogluon -
pip install autogluon[all] -
pip install autogluon.tabular[tabarena]
Imports
- TabularPredictor, TabularDataset
from autogluon.tabular import TabularPredictor, TabularDataset
- TimeSeriesPredictor, TimeSeriesDataFrame
from autogluon.timeseries import TimeSeriesPredictor, TimeSeriesDataFrame
- MultiModalPredictor
from autogluon.multimodal import MultiModalPredictor
Quickstart
import pandas as pd
from autogluon.tabular import TabularPredictor, TabularDataset
# Load example data (using a public dataset URL)
data_url = 'https://raw.githubusercontent.com/mli/ag-docs/main/knot_theory/'
train_data = TabularDataset(f'{data_url}train.csv')
test_data = TabularDataset(f'{data_url}test.csv')
label = 'signature'
predictor = TabularPredictor(label=label, path='AutogluonModels').fit(train_data, presets='high_quality')
predictions = predictor.predict(test_data)
print("Top 5 predictions:")
print(predictions.head())
print(f"Predictor leaderboards:\n{predictor.leaderboard(test_data, silent=True)}")