River

raw JSON →
0.24.2 verified Mon Apr 27 auth: no python

River is a Python library for online machine learning, stream processing, and incremental learning. It provides a comprehensive set of estimators, transformers, and metrics that process data one sample at a time, with built-in drift detection and model evaluation. Current version 0.24.2, released irregularly (several releases per year).

pip install river
error ModuleNotFoundError: No module named 'river.metrics'
cause Trying to import a submodule before the parent module is fully loaded; lazy loading causes this.
fix
Use from river import metrics instead of import river.metrics.
error AttributeError: 'NoneType' object has no attribute 'predict_one'
cause Model not instantiated correctly, or pipeline returned None for step.
fix
Ensure the model is a callable estimator; check pipeline composition.
error ValueError: could not broadcast input array from shape (X,) into shape (Y,)
cause Mismatched dimensions or data type in streaming examples.
fix
Ensure each sample is a dict with matching keys and values are numeric.
error RuntimeError: This model has not been trained yet
cause Called `predict_one` before `learn_one` on models that require training first.
fix
Always call model.learn_one(x, y) at least once before prediction.
breaking River 0.20.0 dropped Python 3.10 support; Python >=3.11 is required.
fix Upgrade Python to 3.11 or later.
breaking River 0.21.0 removed the `optim` module; optimizers are now part of individual estimators or the `optim` package was integrated.
fix Use estimator-specific SGD parameters or import `optim` from river if available (check docs).
deprecated `stream.iter_pandas` is deprecated in 0.24.0; use `stream.iter_array` or `stream.iter_dict` instead.
fix Replace `stream.iter_pandas(df, y)` with `stream.iter_array(df.to_dict('records'), df[y].values)`.
gotcha Many estimators require `predict_one` after `learn_one` before first predict, else return None. Forgetting this leads to missing predictions.
fix Always check `y_pred is not None` before using it, or use a warm-up sample.

Basic online learning with a pipeline and metric.

import os
from river import stream, linear_model, metrics, preprocessing

# Simulate a stream of data (dicts)
X_y = [
    ({'a': 1, 'b': 2}, 3.0),
    ({'a': 4, 'b': 5}, 9.0),
    ({'a': 7, 'b': 8}, 15.0),
]

model = preprocessing.StandardScaler() | linear_model.LinearRegression()
metric = metrics.MAE()

for x, y in stream.iter_array(X_y):  # stream.iter_array expects tuples (x_dict, y)
    y_pred = model.predict_one(x)
    if y_pred is not None:
        metric.update(y, y_pred)
    model.learn_one(x, y)

print(f'MAE: {metric.get():.4f}')