High-Performance Time Series Downsampling
tsdownsample is an extremely fast Python library for time series downsampling, leveraging Rust for its core implementation. It utilizes SIMD instructions and multithreading (via Rayon in Rust) to provide highly optimized, memory-efficient, and flexible algorithms for visualization and analysis of large time series datasets. The library is actively maintained, with its current version being 0.1.4.1.
Warnings
- gotcha The `x` (index) data must be non-strictly monotonic increasing (i.e., sorted) and should not contain NaN values. If not provided, it's assumed to be equally sampled without gaps.
- gotcha When `x` data contains gaps (i.e., non-equidistant sampling), the number of returned downsampled indices might be less than the specified `n_out`. This is because no data points can be selected for empty bins.
- gotcha To leverage multi-threading for performance, the `parallel=True` argument must be explicitly passed to the `downsample` method. The maximum number of threads can be configured via the `TSDOWNSAMPLE_MAX_THREADS` environment variable.
- gotcha A precision error in the `sequential_add_mul` update logic was fixed in version 0.1.4.1. While a bug fix, users relying on the previous (incorrect) numerical behavior might observe different outputs after upgrading.
- gotcha The `downsample` method's signature is `downsample(x, y, n_out=..., **kwargs)`. `x` and `y` are positional arguments, while `n_out` is a mandatory keyword argument.
Install
-
pip install tsdownsample
Imports
- MinMaxLTTBDownsampler
from tsdownsample import MinMaxLTTBDownsampler
- LTTBDownsampler
from tsdownsample import LTTBDownsampler
- MinMaxDownsampler
from tsdownsample import MinMaxDownsampler
Quickstart
import numpy as np
from tsdownsample import MinMaxLTTBDownsampler
# Create a time series with x and y values
x = np.arange(10_000_000, dtype=np.float64)
y = np.random.randn(10_000_000).astype(np.float64)
# Initialize the downsampler
downsampler = MinMaxLTTBDownsampler()
# Downsample the time series to 1000 points
# The 'parallel=True' argument enables multi-threading for performance.
# The 'n_out' argument is mandatory.
selected_indices = downsampler.downsample(x, y, n_out=1000, parallel=True)
# Retrieve the downsampled data points
x_downsampled = x[selected_indices]
y_downsampled = y[selected_indices]
print(f"Original data points: {len(x)}")
print(f"Downsampled data points: {len(x_downsampled)}")
print(f"First 5 downsampled x: {x_downsampled[:5]}")
print(f"First 5 downsampled y: {y_downsampled[:5]}")