Utilsforecast: Forecasting Utilities for Time Series
Utilsforecast provides essential helper functions for time series forecasting workflows, forming a core part of the Nixtlaverse ecosystem alongside libraries like StatsForecast, MLForecast, and NeuralForecast. It offers utilities for data preprocessing, feature engineering, evaluation, and plotting. The library is actively maintained with frequent releases, typically addressing bug fixes, performance enhancements, and new metric/feature additions.
Warnings
- breaking Python 3.8 support was dropped in version 0.2.12. Users on Python 3.8 will need to upgrade their Python environment to 3.9 or newer to use versions 0.2.12 and above.
- gotcha Scaled metric computations (e.g., RMSSE) were corrected in version 0.2.15. Prior versions might have computed these metrics against an incorrect denominator, leading to potentially misleading evaluation results.
- gotcha The `fill_gaps` function received a fix in version 0.2.12 to validate the `freq` parameter with the input data. Incorrect `freq` values might have caused silent issues or errors in earlier versions.
- deprecated Version 0.2.10 addressed Pandas frequency alias deprecations within `generate_series`. While not immediately breaking, reliance on deprecated Pandas aliases might lead to future warnings or errors with newer Pandas versions.
Install
-
pip install utilsforecast -
conda install -c conda-forge utilsforecast
Imports
- fill_gaps
from utilsforecast.preprocessing import fill_gaps
- evaluate
from utilsforecast.evaluation import evaluate
- mape
from utilsforecast.losses import mape
- generate_series
from utilsforecast.data import generate_series
- plot_series
from utilsforecast.plotting import plot_series
Quickstart
import pandas as pd
import numpy as np
from utilsforecast.data import generate_series
from utilsforecast.preprocessing import fill_gaps
from utilsforecast.evaluation import evaluate
from utilsforecast.losses import mape, mse
from functools import partial
# 1. Generate synthetic data with some missing values
series = generate_series(n_series=3, max_length=50, equal_ends=True, seed=42)
# Introduce some gaps for demonstration
mis_idx = np.random.choice(series.index, size=int(len(series) * 0.1), replace=False)
series_with_gaps = series.drop(mis_idx)
# 2. Fill missing values
filled_series = fill_gaps(series_with_gaps, freq='D')
print("Original series head (with gaps if any):")
print(series_with_gaps.head())
print("\nFilled series head:")
print(filled_series.head())
# 3. Prepare data for evaluation (example with dummy models)
# Assume 'y_true' is the actual target and 'model1', 'model2' are predictions
# For demonstration, we'll create a dummy 'y' and then 'predictions'
filled_series['y_true'] = filled_series['y']
filled_series['model1'] = filled_series['y_true'] * np.random.uniform(0.9, 1.1, len(filled_series))
filled_series['model2'] = filled_series['y_true'] * np.random.uniform(0.8, 1.2, len(filled_series))
# Split into train and validation (simplified for quickstart)
horizon = 7
valid_df = filled_series.groupby('unique_id').tail(horizon).copy()
train_df = filled_series.drop(valid_df.index).copy()
# Ensure target column for evaluation matches original 'y'
valid_df['y'] = valid_df['y_true']
train_df['y'] = train_df['y_true']
# 4. Evaluate dummy models
# Mase requires a training set to compute the scaling factor
dummy_mase = partial(mse, seasonality=1)
metrics_df = evaluate(valid_df, metrics=[mape, dummy_mase], train_df=train_df)
print("\nEvaluation Results:")
print(metrics_df.head())