AutoGluon Multimodal

1.5.0 · active · verified Sun Apr 12

AutoGluon Multimodal provides a user-friendly interface for state-of-the-art multimodal deep learning, allowing users to train and deploy models on tabular, text, image, and even audio data with minimal code. It is part of the broader AutoGluon ecosystem, currently at version 1.5.0, and maintains a rapid release cadence with several major and minor updates throughout the year.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `MultiModalPredictor` for a text classification task. The `fit` method automatically handles feature engineering and model selection. For image or other modalities, include file paths in your DataFrame columns. The `presets` argument allows trading off training time for model quality.

import pandas as pd
from autogluon.multimodal import MultiModalPredictor

# Prepare sample data (text classification example)
# For image/video/audio, you'd provide file paths.
train_data = pd.DataFrame({
    'text_feature': [
        'This is a great product and I love it.',
        'Terrible service, very disappointed.',
        'It works as expected, nothing special.',
        'Absolutely fantastic, highly recommend!'
    ],
    'label': ['positive', 'negative', 'neutral', 'positive']
})

# Initialize and train the MultiModalPredictor
predictor = MultiModalPredictor(label='label', problem_type='classification')
predictor.fit(train_data, presets='best_quality')

# Make predictions on new data
test_data = pd.DataFrame({
    'text_feature': [
        'This is amazing!',
        'Not happy with this at all.'
    ]
})
predictions = predictor.predict(test_data)

print(f"Predictions: {predictions.tolist()}")
# To save the predictor:
# predictor.save('./my_multimodal_predictor')

view raw JSON →