tsfresh

0.21.1 · active · verified Thu Apr 16

tsfresh extracts relevant characteristics from time series data, enabling automated feature engineering for machine learning tasks. It supports a wide range of feature calculators, parallel processing, and integrated feature selection. The current version is 0.21.1, and it typically releases new versions every few months, often including bug fixes, dependency updates, and occasionally breaking changes.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to extract features from a simple pandas DataFrame using `tsfresh`. It creates a dummy time series, defines minimal feature calculation settings, and then extracts features, utilizing parallel processing. The `impute_function` is important for robust handling of missing values.

import pandas as pd
from tsfresh import extract_features
from tsfresh.utilities.dataframe_functions import impute
from tsfresh.feature_extraction import MinimalFCParameters

# Create a sample time series DataFrame
# 'id' identifies different time series
# 'time' is the time index within each series (can be datetime or int)
# 'value' is the measurement
df = pd.DataFrame({
    'id': [1, 1, 1, 2, 2, 2, 3, 3, 3],
    'time': [1, 2, 3, 1, 2, 3, 1, 2, 3],
    'value': [10, 12, 11, 5, 6, 7, 8, 8, 9]
})

# Define feature extraction settings (e.g., Minimal for speed)
settings = MinimalFCParameters()

# Extract features
# impute_function is recommended to handle NaN values gracefully
features = extract_features(df,
                            column_id='id',
                            column_sort='time',
                            impute_function=impute,
                            default_fc_parameters=settings,
                            n_jobs=0) # Use all CPU cores for parallelization

print("Extracted Features:")
print(features.head())

view raw JSON →