Anomaly Detection Toolkit (ADTK)

raw JSON →
0.6.2 verified Wed Apr 15 auth: no python

ADTK (Anomaly Detection Toolkit) is a Python package designed for unsupervised and rule-based time series anomaly detection. It provides a modular and intuitive API, offering a collection of common detectors, transformers, and aggregators that can be combined into custom anomaly detection pipelines. Currently at version 0.6.2, ADTK aims to help users build effective models even with limited labeled historical anomaly data, with regular updates and maintenance.

pip install adtk
error ModuleNotFoundError: No module named 'adtk'
cause This error occurs when the 'adtk' package is not installed in the Python environment.
fix
Install the 'adtk' package using pip: 'pip install adtk'.
error ImportError: cannot import name 'validate_series' from 'adtk'
cause This error occurs when attempting to import a function that does not exist in the 'adtk' module.
fix
Ensure that the function name is correct and exists in the 'adtk' module. Refer to the official documentation for the correct import statements.
error TypeError: 'NoneType' object is not subscriptable
cause This error occurs when attempting to subscript a 'NoneType' object, often due to a function returning None instead of an expected iterable.
fix
Check the function's return value to ensure it is not None before attempting to subscript it.
error ValueError: Time index is not monotonically increasing
cause This error occurs when the time index of the time series data is not in a strictly increasing order.
fix
Sort the time series data by its index to ensure it is monotonically increasing before processing.
error AttributeError: module 'adtk' has no attribute 'Detector'
cause This error occurs when trying to access an attribute or class that does not exist in the 'adtk' module.
fix
Verify the correct attribute or class name in the 'adtk' module by consulting the official documentation.
breaking The API for `adtk.visualization.plot` was redesigned in v0.6. The `adtk.data.resample` module was removed, and the output type of `adtk.data.split_train_test` changed. Several model parameters (e.g., `window` in `LevelShiftAD`, `model` in `MinClusterDetector`) became required instead of optional. Additionally, all second-order sub-modules were made private, requiring imports directly from top-level modules (e.g., `from adtk.detector import SomeDetector`).
fix Review the official documentation for `adtk.visualization.plot` and update calls accordingly. Replace usage of `adtk.data.resample` with `pandas.DataFrame.resample`. Adjust calls to `adtk.data.split_train_test` for the new output type. Ensure all required parameters are provided to detectors/transformers. Update import statements to directly import from `adtk.detector`, `adtk.transformer`, `adtk.aggregator`, etc.
breaking In v0.5, the `steps` parameter of `adtk.pipe.Pipenet` changed from accepting a list to requiring a dictionary. The `STL decomposition transformer` was removed and replaced with a `ClassicSeasonalDecomposition` transformer, which also affects the `SeasonalAD` detector's options.
fix For `Pipenet`, update the `steps` parameter from a list of steps to a dictionary mapping step names to transformer/detector instances. If using `STL decomposition`, refactor to use `ClassicSeasonalDecomposition` and adjust `SeasonalAD` configurations if necessary.
gotcha ADTK requires time series data to be in a specific format: a pandas Series with a `DatetimeIndex`, sorted, without `NaN` values, and with a regular frequency. Not adhering to this format or failing to use `adtk.data.validate_series` can lead to unexpected errors or incorrect anomaly detection results.
fix Always pass your time series data through `adtk.data.validate_series()` before using ADTK components. Ensure your `DatetimeIndex` is sorted, free of `NaN`s, and has a consistent frequency or handle these cases explicitly beforehand using pandas utilities.
gotcha Version 0.5.3 introduced a temporary requirement for `statsmodels <0.11` to avoid errors. While v0.6.0 later fixed compatibility issues with `statsmodels v0.11`, users on older `adtk` versions (between v0.5.3 and v0.5.5, or potentially older than v0.5.3 without specific fixes) might encounter issues with newer `statsmodels` versions.
fix If experiencing `statsmodels` related issues with `adtk` versions prior to 0.6.0, either upgrade `adtk` to 0.6.0 or later, or pin your `statsmodels` dependency to a version less than 0.11 (e.g., `statsmodels==0.10.2`).
gotcha ADTK is designed for unsupervised/rule-based anomaly detection and does not automatically select or build an anomaly detection model. Users must understand the specific type of anomaly they want to detect (e.g., outlier, spike, level shift, seasonal pattern violation) and combine the appropriate detectors, transformers, and aggregators to build their model.
fix Familiarize yourself with the different anomaly types described in the ADTK documentation and select the relevant `adtk.detector` and `adtk.transformer` components that address your specific use case. ADTK provides a flexible toolkit, not an automated solution.

This quickstart demonstrates how to use `adtk` to detect seasonal anomalies in a synthetic time series. It involves validating the input series, initializing a `SeasonalAD` detector, fitting it to the data, detecting anomalies, and visualizing the results. Note that for real-world scenarios, you would load your data (e.g., `pd.read_csv`), ensure it has a `DatetimeIndex`, and choose appropriate detector parameters based on your data's characteristics and anomaly types.

import pandas as pd
from adtk.data import validate_series
from adtk.detector import SeasonalAD
from adtk.visualization import plot

# Create a dummy time series with a seasonal pattern and an anomaly
index = pd.date_range(start='2023-01-01', periods=100, freq='H')
data = [i % 24 for i in range(100)] # Daily seasonality
data[50:55] = [50, 51, 52, 53, 54] # Introduce an anomaly
s_train = pd.Series(data, index=index)

# Validate the series (important for ADTK compatibility)
s_train = validate_series(s_train)

# Initialize and train a SeasonalAD detector
# freq can be 'H' for hourly, 'D' for daily, etc., based on data frequency.
# c is a sensitivity parameter for anomaly detection.
seasonal_ad = SeasonalAD(freq=24, c=3.0)

# Fit the detector to the training data and detect anomalies
anomalies = seasonal_ad.fit_detect(s_train)

# Plot the time series with detected anomalies
plot(s_train, anomaly=anomalies, ts_linewidth=1, anomaly_markersize=5, anomaly_color='red', anomaly_tag='marker')

print("Detected anomalies:")
print(anomalies[anomalies].index)