{"id":4632,"library":"mlforecast","title":"MLForecast","description":"MLForecast is a framework for scalable machine learning based time series forecasting. It enables users to apply various machine learning models (like scikit-learn, LightGBM, XGBoost) to time series data, handling complex feature engineering (lags, rolling statistics, date features) and offering distributed training capabilities. The library is actively maintained, with frequent releases, currently at version 1.0.31.","status":"active","version":"1.0.31","language":"en","source_language":"en","source_url":"https://github.com/Nixtla/mlforecast","tags":["time series","forecasting","machine learning","scalable","distributed","feature engineering"],"install":[{"cmd":"pip install mlforecast","lang":"bash","label":"Base installation"},{"cmd":"pip install \"mlforecast[polars]\"","lang":"bash","label":"With Polars DataFrame support"},{"cmd":"pip install \"mlforecast[dask]\"","lang":"bash","label":"With Dask for distributed computing"},{"cmd":"pip install \"mlforecast[ray]\"","lang":"bash","label":"With Ray for distributed computing"},{"cmd":"pip install \"mlforecast[spark]\"","lang":"bash","label":"With Spark for distributed computing"}],"dependencies":[{"reason":"Primary DataFrame format for local operations and examples.","package":"pandas","optional":false},{"reason":"Common base for many machine learning models used with MLForecast.","package":"scikit-learn","optional":false},{"reason":"Alternative high-performance DataFrame backend.","package":"polars","optional":true},{"reason":"Distributed computing backend.","package":"dask","optional":true},{"reason":"Distributed computing backend.","package":"ray","optional":true},{"reason":"Distributed computing backend.","package":"pyspark","optional":true},{"reason":"Required for saving artifacts to remote storages (e.g., S3, GCS).","package":"fsspec","optional":true},{"reason":"Specific fsspec implementation for S3 storage, included in `aws` extra.","package":"s3fs","optional":true}],"imports":[{"symbol":"MLForecast","correct":"from mlforecast import MLForecast"},{"note":"MLForecast works with any scikit-learn compatible regressor, e.g., LightGBM, XGBoost.","symbol":"LGBMRegressor","correct":"import lightgbm as lgb\nmodels = [lgb.LGBMRegressor()]"},{"symbol":"ExpandingMean","correct":"from mlforecast.lag_transforms import ExpandingMean"},{"symbol":"Differences","correct":"from mlforecast.target_transforms import Differences"}],"quickstart":{"code":"import pandas as pd\nfrom sklearn.linear_model import LinearRegression\nfrom mlforecast import MLForecast\nfrom mlforecast.lag_transforms import RollingMean\nfrom mlforecast.utils import generate_daily_series\n\n# 1. Generate sample time series data\n# Data must be in long format with 'unique_id', 'ds', 'y'\ndf = generate_daily_series(\n    n_series=5, max_length=100, n_static_features=0, with_trend=True\n)\ndf['ds'] = pd.to_datetime(df['ds'])\n\n# 2. Define models and features\nmodels = [LinearRegression()]\nlags = [7]\nlag_transforms = {\n    1: [RollingMean(window_size=7)]\n}\ndate_features = ['dayofweek', 'month']\n\n# 3. Instantiate MLForecast\n# freq='D' for daily data; use 'W', 'M', etc. or integer for integer timestamps\nforecast_model = MLForecast(\n    models=models,\n    freq='D',\n    lags=lags,\n    lag_transforms=lag_transforms,\n    date_features=date_features,\n)\n\n# 4. Fit the model\nforecast_model.fit(df)\n\n# 5. Make predictions for the next 7 days\nh = 7\npredictions = forecast_model.predict(h=h)\n\nprint(predictions.head())","lang":"python","description":"This quickstart demonstrates how to set up `mlforecast` for a simple time series prediction task. It generates sample daily series data, defines a Linear Regression model with a lag feature (a rolling mean of the target from the previous day), adds date-based features, fits the model, and generates forecasts."},"warnings":[{"fix":"Review your code for direct usage of `window_ops` or `numba` from `mlforecast` and refactor. `mlforecast` now handles efficient feature engineering internally.","message":"Version 1.0.0 removed `window_ops` and `numba` as direct dependencies, potentially requiring code changes if these were used explicitly or if custom `window_ops` implementations were relied upon.","severity":"breaking","affected_versions":">=1.0.0"},{"fix":"Ensure your input data (`df`) passed to `fit` and `preprocess` does not contain `NaN` values in the target column (`y`), especially if you were relying on previous `dropna=False` behavior, as transformations can propagate `NaN`s.","message":"In version 0.15.0, the `fit` method's `dropna` parameter's default behavior changed. If `dropna=False` was passed, rows with null targets are now dropped, which was not the case in previous versions.","severity":"breaking","affected_versions":">=0.15.0"},{"fix":"Rename your DataFrame columns to `unique_id`, `ds`, `y` or explicitly pass `id_col`, `time_col`, `target_col` arguments to `MLForecast.fit()` and `MLForecast.predict()` methods.","message":"Input dataframes *must* be in a 'long' format with specific column names: `unique_id` (series identifier), `ds` (datestamp/timestamp), and `y` (target value). Deviating from this format will cause errors unless `id_col`, `time_col`, `target_col` are explicitly passed to `MLForecast` methods.","severity":"gotcha","affected_versions":"All versions"},{"fix":"If prediction intervals are required, avoid using transfer learning techniques with `MLForecast`. Consider alternative methods for uncertainty quantification in transfer learning scenarios or separate the tasks.","message":"Prediction intervals are not supported when using transfer learning with `MLForecast`. Attempting to combine these functionalities will result in a `ValueError`.","severity":"gotcha","affected_versions":"All versions with transfer learning capability"},{"fix":"Explicitly set `num_threads=1` in the `MLForecast` or `DistributedMLForecast` constructor when working with Dask DataFrames where `df.npartitions > client.n_workers`.","message":"When using distributed Dask DataFrames, if you have more partitions than Dask workers, it's recommended to set `num_threads=1` in `MLForecast` to prevent nested parallelism and potential performance issues or deadlocks.","severity":"gotcha","affected_versions":"All versions with Dask distributed support"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}