{"id":9365,"library":"torch-runstats","title":"Running Statistics for PyTorch","description":"torch-runstats provides efficient running/online statistics (mean, standard deviation, variance, count) for PyTorch tensors. It's designed for scenarios where data arrives sequentially or cannot be stored in its entirety. The current version is 0.2.0, and its release cadence is slow, suggesting a mature and stable library for its specific functionality.","status":"active","version":"0.2.0","language":"en","source_language":"en","source_url":"https://github.com/mir-group/pytorch_runstats","tags":["pytorch","statistics","online learning","running stats","mean","std","variance"],"install":[{"cmd":"pip install torch-runstats","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Core dependency for tensor operations and GPU acceleration.","package":"torch","optional":false}],"imports":[{"symbol":"RunningMeanStd","correct":"from torch_runstats import RunningMeanStd"},{"symbol":"RunningStats","correct":"from torch_runstats import RunningStats"}],"quickstart":{"code":"import torch\nfrom torch_runstats import RunningMeanStd, RunningStats\n\n# Example with RunningMeanStd\n# Initialize for a feature vector of size 3\nrms = RunningMeanStd(shape=(3,))\n\n# Simulate incoming data\nx1 = torch.randn(10, 3)\nx2 = torch.randn(5, 3)\n\nrms.update(x1)\nrms.update(x2)\n\nprint(f\"Running Mean: {rms.mean}\")\nprint(f\"Running Std Dev: {rms.std}\")\n\n# Example with RunningStats (more general, includes variance and count)\nrs = RunningStats(shape=(2,))\ny1 = torch.tensor([[1.0, 2.0], [3.0, 4.0]])\ny2 = torch.tensor([[5.0, 6.0], [float('nan'), 8.0]]) # Demonstrating NaN masking\n\nrs.update(y1)\nrs.update(y2)\n\nprint(f\"Running Stats Mean: {rs.mean}\")\nprint(f\"Running Stats Std Dev: {rs.std}\")\nprint(f\"Running Stats Count: {rs.count}\") # NaN in y2 is ignored by default\n","lang":"python","description":"This quickstart demonstrates how to initialize and use `RunningMeanStd` and `RunningStats` to track statistics for streaming data. It highlights the `shape` parameter for multi-dimensional data and implicitly shows `mask_nan=True` (default in v0.2.0) functionality for `RunningStats`."},"warnings":[{"fix":"Ensure `shape` matches the last dimension(s) of your input tensors (e.g., for `N x F` tensor, `shape=(F,)`). If you want statistics per feature across multiple dimensions, adjust `shape` accordingly.","message":"The `shape` parameter in `RunningMeanStd` and `RunningStats` initialization is crucial. It defines the shape of the *feature vector* for which statistics are computed, not the batch dimension. Incorrect `shape` leads to dimension mismatch errors during `update`.","severity":"gotcha","affected_versions":">=0.1.0"},{"fix":"If you need `NaN` values to propagate or want to explicitly handle them, be aware of this default. For `RunningMeanStd`, `NaN`s will propagate, as it doesn't have a `mask_nan` parameter directly.","message":"By default, from version 0.2.0, `mask_nan=True` for `RunningStats`, meaning `NaN` values in the input tensor are ignored when computing statistics and count. This might change behavior for users upgrading from v0.1.0 or expecting `NaN` to propagate.","severity":"gotcha","affected_versions":">=0.2.0"},{"fix":"Check `instance.count` before relying on `instance.std` if small sample sizes are possible. Handle `NaN` or `0` cases in downstream logic if your application requires valid `std` at all times.","message":"Standard deviation (`std`) or variance can be zero or `NaN` if `RunningStats` or `RunningMeanStd` has not accumulated at least two distinct data points. Accessing `std` too early will result in `NaN` or `0`.","severity":"gotcha","affected_versions":">=0.1.0"},{"fix":"If your project directly or indirectly used `torch_scatter`, ensure it's explicitly listed in your project's dependencies if you upgrade `torch-runstats` to v0.2.0+.","message":"The dependency on `torch_scatter` was removed in version 0.2.0. While this primarily impacts internal implementation and reduces install size, users who might have indirectly relied on `torch_scatter` being present due to `torch-runstats` might find it missing.","severity":"breaking","affected_versions":">=0.2.0"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Initialize the `RunningStats` or `RunningMeanStd` object with `shape` matching the last dimension(s) of your input data. For a tensor `x` of shape `(batch, ..., feature_dim)`, use `RunningStats(shape=(feature_dim,))`.","cause":"The `shape` parameter provided during `RunningStats` or `RunningMeanStd` initialization does not match the trailing dimensions of the input tensor being passed to `update()`.","error":"RuntimeError: The size of tensor a (10) must match the size of tensor b (3) at non-singleton dimension 1"},{"fix":"Ensure that your `RunningStats` or `RunningMeanStd` instance has received at least two valid data points via `update()` calls before accessing `.std`. If `count < 2`, `std` is undefined or zero.","cause":"This warning occurs when attempting to retrieve the standard deviation (`.std`) before at least two data points have been accumulated by `update()`.","error":"UserWarning: std is NaN due to insufficient data."},{"fix":"If you need `torch_scatter` for other parts of your code, install it explicitly (`pip install torch-scatter`). `torch-runstats` itself no longer relies on it from v0.2.0 onwards.","cause":"You are trying to access functionality that was implicitly available through `torch_scatter` when it was a dependency of `torch-runstats` prior to version 0.2.0.","error":"AttributeError: 'RunningStats' object has no attribute 'some_method_from_torch_scatter'"}]}