AutoGluon Tabular

1.5.0 · active · verified Sun Apr 12

AutoGluon Tabular provides a fast and accurate AutoML library specifically designed for tabular data, allowing users to train and deploy high-accuracy machine learning models with just a few lines of code. Developed by AWS AI, it offers automated stack ensembling, deep learning integration, and handles feature engineering and hyperparameter tuning automatically. The library maintains an active release cadence with major updates every few months and intermediate patch releases.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to train a `TabularPredictor` on a dataset and then make predictions. It uses a dummy DataFrame for immediate execution, with commented-out lines showing how to load data from AutoGluon's public S3 bucket for a more realistic scenario. The `presets='best'` option instructs AutoGluon to use its best-performing configuration.

import pandas as pd
from autogluon.tabular import TabularPredictor, TabularDataset

# Create dummy dataframes if not using S3 URLs for demonstration
train_data = pd.DataFrame({
    'feature_1': [1, 2, 3, 4, 5],
    'feature_2': ['A', 'B', 'A', 'C', 'B'],
    'target_column': [0, 1, 0, 1, 0]
})
test_data = pd.DataFrame({
    'feature_1': [6, 7],
    'feature_2': ['C', 'A']
})

# Or load directly from AutoGluon's S3 bucket (uncomment for real usage)
# data_root = 'https://autogluon.s3.amazonaws.com/datasets/Inc/'
# train_data = TabularDataset(data_root + 'train.csv')
# test_data = TabularDataset(data_root + 'test.csv')

# Initialize and train the predictor
predictor = TabularPredictor(label='target_column', path='./AutogluonModels').fit(train_data, presets='best')

# Make predictions
predictions = predictor.predict(test_data)
print("Predictions:\n", predictions)

# Evaluate the model (requires a label column in test_data, not present in dummy test_data)
# Assuming test_data_with_labels exists:
# test_data_with_labels = pd.DataFrame({
#     'feature_1': [6, 7],
#     'feature_2': ['C', 'A'],
#     'target_column': [1, 0]
# })
# leaderboards = predictor.leaderboard(test_data_with_labels, silent=True)
# print("Leaderboard:\n", leaderboards)

view raw JSON →