AutoGluon Features
The `autogluon-features` sub-package provides core machine learning feature engineering capabilities for the AutoGluon AutoML library. It handles tasks like detecting data types, transforming categorical, datetime, and text features, and managing feature metadata. This package is usually consumed internally by AutoGluon's higher-level predictors, but can be used directly for advanced customization. It is currently at version 1.5.0 and releases in conjunction with the main AutoGluon library.
Warnings
- breaking Loading models trained with a different AutoGluon version (which includes `autogluon-features` components) is not supported and can lead to crashes, incorrect predictions, or unexpected behavior due to internal API changes and serialization formats.
- gotcha Directly using `autogluon-features` components (like `FeatureGenerator` subclasses) is primarily for advanced customization. Most users should leverage feature engineering through AutoGluon's high-level APIs like `TabularPredictor`, which manage feature generation automatically.
- breaking Python version compatibility has changed. Support for Python 3.8 was dropped in AutoGluon v1.2.0, and support for Python 3.12 was added. Current versions require Python >=3.10 and <3.14.
Install
-
pip install autogluon-features -
pip install autogluon
Imports
- PipelineFeatureGenerator
from autogluon.features.generators import PipelineFeatureGenerator
- CategoryFeatureGenerator
from autogluon.features.generators import CategoryFeatureGenerator
- FeatureMetadata
from autogluon.features.feature_metadata import FeatureMetadata
Quickstart
import pandas as pd
from autogluon.features.generators import PipelineFeatureGenerator, CategoryFeatureGenerator, DatetimeFeatureGenerator
# Create a sample DataFrame
data = {
'numeric_col': [1, 2, 3, 4, 5],
'categorical_col': ['A', 'B', 'A', 'C', 'B'],
'datetime_col': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'],
'text_col': ['hello world', 'foo bar', 'hello again', 'world peace', 'bar foo']
}
df = pd.DataFrame(data)
# Initialize a pipeline of feature generators
# CategoryFeatureGenerator converts object/category dtypes
# DatetimeFeatureGenerator extracts year, month, day, etc. from datetime columns
pipeline_generator = PipelineFeatureGenerator(
[CategoryFeatureGenerator(), DatetimeFeatureGenerator(fillna_limit=0)]
)
# Fit and transform the DataFrame
df_transformed = pipeline_generator.fit_transform(X=df)
print("Original DataFrame:")
print(df)
print("\nTransformed DataFrame features (first 5 rows):")
print(df_transformed.head())
print("\nFeature metadata after transformation (output features):")
print(pipeline_generator.feature_metadata_out.pretty_print())