AutoGluon Tabular
AutoGluon Tabular provides a fast and accurate AutoML library specifically designed for tabular data, allowing users to train and deploy high-accuracy machine learning models with just a few lines of code. Developed by AWS AI, it offers automated stack ensembling, deep learning integration, and handles feature engineering and hyperparameter tuning automatically. The library maintains an active release cadence with major updates every few months and intermediate patch releases.
Warnings
- breaking Models trained with an older version of AutoGluon are NOT compatible with newer versions. Users must re-train models after upgrading the library.
- breaking Several `TabularPredictor` methods were renamed in v1.3.0. For example, `persist_models` became `persist`, and `get_model_names` became `model_names`.
- gotcha Python version support changes: AutoGluon dropped support for Python 3.8 in v1.2.0, while adding support for 3.12. Current versions (1.5.0) support Python 3.10-3.13, with experimental support for 3.13 on Windows.
- gotcha If using 'TABPFNV2' model, it is strongly recommended to switch to 'REALTABPFN-V2' due to breaking changes in the underlying TabPFN library. 'REALTABPFN-V2.5' also exists but has a non-commercial license and requires HuggingFace authentication.
Install
-
pip install autogluon.tabular[all] -
pip install autogluon.tabular
Imports
- TabularPredictor
from autogluon.tabular import TabularPredictor
- TabularDataset
from autogluon.tabular import TabularDataset
Quickstart
import pandas as pd
from autogluon.tabular import TabularPredictor, TabularDataset
# Create dummy dataframes if not using S3 URLs for demonstration
train_data = pd.DataFrame({
'feature_1': [1, 2, 3, 4, 5],
'feature_2': ['A', 'B', 'A', 'C', 'B'],
'target_column': [0, 1, 0, 1, 0]
})
test_data = pd.DataFrame({
'feature_1': [6, 7],
'feature_2': ['C', 'A']
})
# Or load directly from AutoGluon's S3 bucket (uncomment for real usage)
# data_root = 'https://autogluon.s3.amazonaws.com/datasets/Inc/'
# train_data = TabularDataset(data_root + 'train.csv')
# test_data = TabularDataset(data_root + 'test.csv')
# Initialize and train the predictor
predictor = TabularPredictor(label='target_column', path='./AutogluonModels').fit(train_data, presets='best')
# Make predictions
predictions = predictor.predict(test_data)
print("Predictions:\n", predictions)
# Evaluate the model (requires a label column in test_data, not present in dummy test_data)
# Assuming test_data_with_labels exists:
# test_data_with_labels = pd.DataFrame({
# 'feature_1': [6, 7],
# 'feature_2': ['C', 'A'],
# 'target_column': [1, 0]
# })
# leaderboards = predictor.leaderboard(test_data_with_labels, silent=True)
# print("Leaderboard:\n", leaderboards)