AutoGluon Multimodal
AutoGluon Multimodal provides a user-friendly interface for state-of-the-art multimodal deep learning, allowing users to train and deploy models on tabular, text, image, and even audio data with minimal code. It is part of the broader AutoGluon ecosystem, currently at version 1.5.0, and maintains a rapid release cadence with several major and minor updates throughout the year.
Warnings
- breaking Models trained with one AutoGluon version are generally NOT compatible with other versions for loading and inference. Always use the same AutoGluon version (or a patch release of the same major.minor) that was used to train the model.
- gotcha AutoGluon Multimodal relies on deep learning models and can be resource-intensive (CPU, GPU, RAM), especially for large datasets or complex multimodal tasks. Training times can be significant.
- breaking Python version compatibility has changed across major releases. AutoGluon 1.2.0 dropped support for Python 3.8 and added support for Python 3.12. The current version (1.5.0) supports Python >=3.10, <3.14.
- gotcha Installation of `autogluon.multimodal` can be complex due to its large number of deep learning dependencies (e.g., PyTorch, Transformers, Timm). This can lead to conflicts with other installed packages or slow installation times.
Install
-
pip install autogluon.multimodal
Imports
- MultiModalPredictor
from autogluon.multimodal import MultiModalPredictor
Quickstart
import pandas as pd
from autogluon.multimodal import MultiModalPredictor
# Prepare sample data (text classification example)
# For image/video/audio, you'd provide file paths.
train_data = pd.DataFrame({
'text_feature': [
'This is a great product and I love it.',
'Terrible service, very disappointed.',
'It works as expected, nothing special.',
'Absolutely fantastic, highly recommend!'
],
'label': ['positive', 'negative', 'neutral', 'positive']
})
# Initialize and train the MultiModalPredictor
predictor = MultiModalPredictor(label='label', problem_type='classification')
predictor.fit(train_data, presets='best_quality')
# Make predictions on new data
test_data = pd.DataFrame({
'text_feature': [
'This is amazing!',
'Not happy with this at all.'
]
})
predictions = predictor.predict(test_data)
print(f"Predictions: {predictions.tolist()}")
# To save the predictor:
# predictor.save('./my_multimodal_predictor')