stumpy: Time Series Matrix Profile Library
STUMPY is a powerful and scalable Python library that efficiently computes the matrix profile, a novel data structure for time series analysis. This allows for a variety of time series data mining tasks such as pattern/motif discovery, anomaly detection, semantic segmentation, and more. It is currently at version 1.14.1 and maintains an active development cycle with regular releases.
Warnings
- breaking Python Version Requirement: STUMPY requires Python 3.10 or newer. Users on older Python versions (e.g., 3.9 or earlier) will encounter installation or runtime errors.
- gotcha Window Size (m) Selection: Choosing an appropriate window size (`m`) for `stumpy.stump` is critical. An ill-chosen `m` can lead to uninformative or misleading matrix profiles. The optimal `m` is highly dependent on the nature of your time series and the patterns you are looking for.
- gotcha Understanding Matrix Profile Output: The `stumpy.stump` function returns a NumPy array with four specific columns: `[distance, index, left_index, right_index]`. Misinterpreting these columns, especially the 'index' (nearest neighbor) versus 'left_index'/'right_index' (directional nearest neighbors), is a common footgun.
- gotcha GPU Acceleration Requires NVIDIA and CUDA: While STUMPY supports GPU acceleration (e.g., with `stumpy.gpu_stump`), this functionality relies on Numba's CUDA JIT compiler. This means you need an NVIDIA GPU and the appropriate CUDA Toolkit installed on your system; simply installing STUMPY will not automatically provide GPU capabilities.
- deprecated NumPy Version Adherence to NEP 29: STUMPY follows NEP 29 (NumPy Enhancement Proposal 29) for its supported Python and NumPy versions. This means that older NumPy versions will eventually be dropped from support without specific breaking changes from STUMPY itself, but due to upstream dependency policy.
Install
-
pip install stumpy -
conda install -c conda-forge stumpy
Imports
- stumpy
import stumpy
- numpy
import numpy as np
Quickstart
import stumpy
import numpy as np
# Generate a random time series
your_time_series = np.random.rand(1000)
# Define a window size (m) for subsequences
window_size = 50
# Compute the matrix profile
matrix_profile = stumpy.stump(your_time_series, m=window_size)
print(f"Matrix Profile shape: {matrix_profile.shape}")
# The matrix_profile array contains 4 columns:
# 0: Nearest neighbor distance (matrix profile value)
# 1: Nearest neighbor index
# 2: Left nearest neighbor index
# 3: Right nearest neighbor index