AutoGluon
AutoGluon is an open-source AutoML library developed by AWS AI, designed to automate machine learning tasks for tabular, image, text, and time series data. It enables users to train and deploy high-accuracy machine learning and deep learning models with minimal code, often lauded for its '3 lines of code' capability. The library is actively maintained with frequent updates and is currently at version 1.5.0, offering state-of-the-art predictive performance.
Warnings
- breaking Models trained with older versions of AutoGluon are not compatible with newer versions. Users must retrain models after upgrading AutoGluon to a new major or minor release.
- deprecated Several `TabularPredictor` methods were deprecated in v1.0.0 and subsequently removed in v1.3.0. Using these old method names will result in errors.
- breaking AutoGluon regularly updates its supported Python versions. For instance, Python 3.8 support was dropped in v1.2.0, and newer versions (e.g., v1.5.0) require Python 3.10-3.13. Running AutoGluon on an unsupported Python version will lead to installation or runtime failures.
- gotcha The `autogluon-common` PyPI package provides helper functionality and is not intended for standalone use to access the full AutoML features. Users seeking the complete AutoGluon experience (e.g., `TabularPredictor`, `TimeSeriesPredictor`) should install the `autogluon` meta-package.
- deprecated The `autogluon.eda` module, which provided exploratory data analysis functionality, has been deprecated.
Install
-
pip install autogluon -
pip install autogluon-common
Imports
- TabularPredictor
from autogluon.tabular import TabularPredictor
- TabularDataset
from autogluon.tabular import TabularDataset
- TimeSeriesPredictor
from autogluon.timeseries import TimeSeriesPredictor
- MultiModalPredictor
from autogluon.multimodal import MultiModalPredictor
Quickstart
import pandas as pd
from autogluon.tabular import TabularPredictor, TabularDataset
# Create dummy training data
train_data = pd.DataFrame({
'feature1': [1, 2, 3, 4, 5],
'feature2': ['A', 'B', 'A', 'C', 'B'],
'target': [0, 1, 0, 1, 0]
})
# Save to a CSV for TabularDataset
train_data.to_csv('train.csv', index=False)
# Create dummy test data
test_data = pd.DataFrame({
'feature1': [6, 7],
'feature2': ['C', 'A']
})
test_data.to_csv('test.csv', index=False)
# Load data using AutoGluon's TabularDataset
train_dataset = TabularDataset('train.csv')
# For demonstration, label is 'target'
label = 'target'
# Initialize and train the TabularPredictor
predictor = TabularPredictor(label=label, path='AutoGluonModels').fit(train_dataset, presets='medium_quality')
# Make predictions on new data
test_dataset = TabularDataset('test.csv')
predictions = predictor.predict(test_dataset)
print("Predictions:\n", predictions)
# Clean up generated files (optional)
import shutil
shutil.rmtree('AutoGluonModels', ignore_errors=True)
import os
os.remove('train.csv')
os.remove('test.csv')