AutoGluon Features

1.5.0 · active · verified Sun Apr 12

The `autogluon-features` sub-package provides core machine learning feature engineering capabilities for the AutoGluon AutoML library. It handles tasks like detecting data types, transforming categorical, datetime, and text features, and managing feature metadata. This package is usually consumed internally by AutoGluon's higher-level predictors, but can be used directly for advanced customization. It is currently at version 1.5.0 and releases in conjunction with the main AutoGluon library.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to instantiate and use a `PipelineFeatureGenerator` from `autogluon-features` to apply common transformations like categorical encoding and datetime feature extraction to a pandas DataFrame.

import pandas as pd
from autogluon.features.generators import PipelineFeatureGenerator, CategoryFeatureGenerator, DatetimeFeatureGenerator

# Create a sample DataFrame
data = {
    'numeric_col': [1, 2, 3, 4, 5],
    'categorical_col': ['A', 'B', 'A', 'C', 'B'],
    'datetime_col': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'],
    'text_col': ['hello world', 'foo bar', 'hello again', 'world peace', 'bar foo']
}
df = pd.DataFrame(data)

# Initialize a pipeline of feature generators
# CategoryFeatureGenerator converts object/category dtypes
# DatetimeFeatureGenerator extracts year, month, day, etc. from datetime columns
pipeline_generator = PipelineFeatureGenerator(
    [CategoryFeatureGenerator(), DatetimeFeatureGenerator(fillna_limit=0)]
)

# Fit and transform the DataFrame
df_transformed = pipeline_generator.fit_transform(X=df)

print("Original DataFrame:")
print(df)
print("\nTransformed DataFrame features (first 5 rows):")
print(df_transformed.head())
print("\nFeature metadata after transformation (output features):")
print(pipeline_generator.feature_metadata_out.pretty_print())

view raw JSON →