tslearn: Time-Series Machine Learning Toolkit
tslearn is a Python package providing a comprehensive machine learning toolkit specifically designed for the analysis of time-series data. It offers various algorithms for clustering, classification, and regression on time series, building upon the `scikit-learn`, `numpy`, and `scipy` libraries. The current version is 0.8.1, and the library is under active development and maintenance.
Common errors
-
ValueError: Expected a 3D array (n_ts, sz, d) or a list of 2D arrays (sz, d), got 2D array (sz, d)
cause Attempting to pass a 2D NumPy array (e.g., `(length, features)`) directly to a `tslearn` estimator when a dataset of multiple time series (3D array or list of 2D arrays) is expected. This often happens when users treat a single time series as a dataset.fixWrap your single 2D time series in a list (e.g., `[my_single_time_series]`) or, preferably, use `tslearn.utils.to_time_series_dataset` to ensure correct formatting for both single and multiple time series: `X_formatted = to_time_series_dataset([my_single_time_series])` or `X_formatted = to_time_series_dataset(list_of_time_series)`. -
ModuleNotFoundError: No module named 'keras'
cause Trying to import or use functionalities from `tslearn.shapelets` without having `Keras` (specifically Keras3+) installed. This module has an optional dependency.fixInstall Keras3+: `pip install keras` (or use `pip install tslearn[shapelets]`). If you intend to use a specific backend, install it too (e.g., `pip install tensorflow`). -
Cannot import name '...' from 'tslearn' (most likely 'tslearn.something')
cause Incorrect import path for a specific function or class. `tslearn` organizes its functionalities into submodules (e.g., `clustering`, `utils`, `metrics`).fixRefer to the `tslearn` documentation or API reference to find the correct submodule for the desired function or class. For example, `TimeSeriesKMeans` is in `tslearn.clustering`, not directly under `tslearn`. Correct: `from tslearn.clustering import TimeSeriesKMeans`.
Warnings
- breaking Support for Python versions 3.8 and 3.9 was dropped starting from `tslearn` version 0.7.0. Current versions (e.g., 0.8.1) require Python 3.10 or newer.
- gotcha `tslearn` expects time series datasets to be formatted as a 3D NumPy array of shape `(n_ts, max_sz, d)`, where `n_ts` is the number of time series, `max_sz` is the maximum length of the time series in the dataset, and `d` is the dimensionality of each time point. Variable-length time series are handled by padding shorter series with `NaN` values.
- gotcha The `tslearn.shapelets` module has additional dependencies, specifically requiring `Keras3+`. The backend (TensorFlow, PyTorch, or JAX) used by Keras can be selected via the `KERAS_BACKEND` environment variable.
Install
-
pip install tslearn
Imports
- TimeSeriesKMeans
from tslearn.clustering import TimeSeriesKMeans
- to_time_series_dataset
from tslearn.utils import to_time_series_dataset
- SoftDTW
from tslearn.metrics import SoftDTW
- TimeSeriesScalerMeanVariance
from tslearn.preprocessing import TimeSeriesScalerMeanVariance
Quickstart
import numpy as np
from tslearn.clustering import TimeSeriesKMeans
from tslearn.utils import to_time_series_dataset
# Generate some sample time series data
np.random.seed(0)
n_ts = 10 # Number of time series
max_sz = 100 # Maximum length of time series
d = 1 # Dimensionality of each time point (univariate)
# Create a list of 2D numpy arrays for variable-length time series
my_time_series = []
for i in range(n_ts):
length = np.random.randint(50, max_sz + 1)
series = np.random.rand(length, d)
my_time_series.append(series)
# Convert to tslearn's expected 3D dataset format
X = to_time_series_dataset(my_time_series)
# Initialize and fit a TimeSeriesKMeans model
# Using dtw (Dynamic Time Warping) as the metric
km = TimeSeriesKMeans(n_clusters=2, metric="dtw", max_iter=10, random_state=0)
cluster_labels = km.fit_predict(X)
print(f"Input shape: {X.shape}")
print(f"Cluster labels: {cluster_labels}")