Featuretools
Featuretools is an open-source Python library for automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices suitable for machine learning. The library, currently at version 1.31.0, is actively maintained by Alteryx and follows a frequent release cadence, often introducing new features and improvements.
Common errors
-
AttributeError: 'Series' object has no attribute 'ww'
cause Attempting to access Woodwork attributes (e.g., `logical_type` or `semantic_tags`) on a pandas Series directly, or using old syntax from pre-1.0.0 versions after migrating to Featuretools 1.0+.fixEnsure you are accessing Woodwork properties via the DataFrame or a Woodwork-aware object, often by first selecting the column from the `Woodwork DataFrame` and then accessing its `.ww` accessor. Example: `entityset['dataframe_name'].ww['column_name'].logical_type`. -
AttributeError: Can't get 'time' column from cutoff_time. The column must be labeled either as the target entity's time index variable name or as 'time'.
cause In `featuretools.dfs` or `featuretools.calculate_feature_matrix`, the 'time' column in a `cutoff_time` DataFrame is not correctly named or identified.fixEnsure the time column in your `cutoff_time` DataFrame is explicitly named either 'time' or matches the time index variable name of your target DataFrame in the EntitySet. This was a breaking change in v0.16.0. -
ImportError: cannot import name 'flatten_list' from 'featuretools.utils'
cause The utility function `flatten_list` was moved within the Featuretools internal structure.fixThis issue was a bug in certain minor versions. As of v1.31.0, the fix involved moving this function internally. Users experiencing this on older `1.x` versions should update to the latest patch release (e.g., `1.31.0`) to resolve it. -
UserWarning: Index is not unique on dataframe
cause When creating an EntitySet or adding a DataFrame, the specified index column contains duplicate values.fixEnsure that the column designated as the index for a DataFrame within the EntitySet contains only unique values. Duplicate index values can lead to unexpected behavior in feature calculation. Review your data and pre-process to ensure index uniqueness.
Warnings
- breaking As of Featuretools v1.31.0, EntitySets can no longer be created directly from Dask or PySpark DataFrames. This functionality has been removed. Users must convert their Dask/PySpark DataFrames to pandas DataFrames first.
- breaking The `featuretools` command-line interface (CLI) has been completely removed in version 1.31.0.
- breaking Featuretools v1.0.0 introduced significant breaking changes by replacing its legacy custom typing system with Woodwork. The `Entity` and `Variable` classes were removed, and `EntitySet` creation and primitive definitions changed. Columns now use Woodwork `LogicalType` and `semantic_tags` for type information.
- gotcha Dask is now an optional dependency. If you use `calculate_feature_matrix` with `n_jobs` set to anything other than 1 (to enable parallel processing), you must explicitly install Dask.
Install
-
pip install featuretools -
conda install -c conda-forge featuretools
Imports
- featuretools
import featuretools as ft
- EntitySet
from featuretools.entityset import EntitySet
from featuretools import EntitySet
Quickstart
import featuretools as ft
import pandas as pd
# Load mock customer data into an EntitySet
es = ft.demo.load_mock_customer(return_entityset=True)
# Define target dataframe for feature engineering
target_dataframe_name = "customers"
# Run Deep Feature Synthesis (DFS)
feature_matrix, feature_defs = ft.dfs(
entityset=es,
target_dataframe_name=target_dataframe_name,
agg_primitives=["count", "sum", "mean"],
trans_primitives=["day", "month", "weekday"]
)
print(feature_matrix.head())