MLForecast
MLForecast is a framework for scalable machine learning based time series forecasting. It enables users to apply various machine learning models (like scikit-learn, LightGBM, XGBoost) to time series data, handling complex feature engineering (lags, rolling statistics, date features) and offering distributed training capabilities. The library is actively maintained, with frequent releases, currently at version 1.0.31.
Warnings
- breaking Version 1.0.0 removed `window_ops` and `numba` as direct dependencies, potentially requiring code changes if these were used explicitly or if custom `window_ops` implementations were relied upon.
- breaking In version 0.15.0, the `fit` method's `dropna` parameter's default behavior changed. If `dropna=False` was passed, rows with null targets are now dropped, which was not the case in previous versions.
- gotcha Input dataframes *must* be in a 'long' format with specific column names: `unique_id` (series identifier), `ds` (datestamp/timestamp), and `y` (target value). Deviating from this format will cause errors unless `id_col`, `time_col`, `target_col` are explicitly passed to `MLForecast` methods.
- gotcha Prediction intervals are not supported when using transfer learning with `MLForecast`. Attempting to combine these functionalities will result in a `ValueError`.
- gotcha When using distributed Dask DataFrames, if you have more partitions than Dask workers, it's recommended to set `num_threads=1` in `MLForecast` to prevent nested parallelism and potential performance issues or deadlocks.
Install
-
pip install mlforecast -
pip install "mlforecast[polars]" -
pip install "mlforecast[dask]" -
pip install "mlforecast[ray]" -
pip install "mlforecast[spark]"
Imports
- MLForecast
from mlforecast import MLForecast
- LGBMRegressor
import lightgbm as lgb models = [lgb.LGBMRegressor()]
- ExpandingMean
from mlforecast.lag_transforms import ExpandingMean
- Differences
from mlforecast.target_transforms import Differences
Quickstart
import pandas as pd
from sklearn.linear_model import LinearRegression
from mlforecast import MLForecast
from mlforecast.lag_transforms import RollingMean
from mlforecast.utils import generate_daily_series
# 1. Generate sample time series data
# Data must be in long format with 'unique_id', 'ds', 'y'
df = generate_daily_series(
n_series=5, max_length=100, n_static_features=0, with_trend=True
)
df['ds'] = pd.to_datetime(df['ds'])
# 2. Define models and features
models = [LinearRegression()]
lags = [7]
lag_transforms = {
1: [RollingMean(window_size=7)]
}
date_features = ['dayofweek', 'month']
# 3. Instantiate MLForecast
# freq='D' for daily data; use 'W', 'M', etc. or integer for integer timestamps
forecast_model = MLForecast(
models=models,
freq='D',
lags=lags,
lag_transforms=lag_transforms,
date_features=date_features,
)
# 4. Fit the model
forecast_model.fit(df)
# 5. Make predictions for the next 7 days
h = 7
predictions = forecast_model.predict(h=h)
print(predictions.head())