PyOD
PyOD is a comprehensive and scalable Python library for outlier detection (anomaly detection), offering over 50 detection models. It provides a unified API, making it easy to use and compare various algorithms. The library is currently at version 2.1.0, with frequent minor releases addressing compatibility and adding new features, including recent advancements in multi-modal anomaly detection using foundation model embeddings.
Warnings
- breaking PyOD removed all TensorFlow and Keras code, migrating deep learning models entirely to PyTorch.
- breaking Default parameters for some models, notably VAE, have changed (e.g., output activation for VAE to `identity`).
- gotcha The `contamination` parameter is crucial and can significantly impact detection results and thresholds, especially when not set accurately.
- gotcha PyOD frequently updates its internal dependencies and sometimes makes adjustments for `scikit-learn` breaking changes.
- gotcha The new `EmbeddingOD` framework (v2.1.0+) requires additional, potentially large, third-party libraries (e.g., `sentence-transformers`, `openai`, `transformers`) that are not installed by default.
Install
-
pip install pyod -
pip install pyod[text,image]
Imports
- KNN
from pyod.models.knn import KNN
- generate_data
from pyod.utils.data import generate_data
- EmbeddingOD
from pyod.models.embedding_od import EmbeddingOD
Quickstart
from pyod.models.knn import KNN
from pyod.utils.data import generate_data
import numpy as np
# Generate random data with 20% outliers
X_train, y_train = generate_data(n_train=200, n_features=2, n_outliers=20, random_state=42)
# Initialize and train a kNN detector
clf = KNN(contamination=0.1) # Set contamination based on expected outlier ratio
clf.fit(X_train)
# Get the prediction labels (0: inliers, 1: outliers)
y_train_pred = clf.labels_
# Get the raw outlier scores
y_train_scores = clf.decision_scores_
print(f"Number of training samples: {len(X_train)}")
print(f"Number of predicted outliers: {np.count_nonzero(y_train_pred)}")