fastcluster
raw JSON → 1.3.0 verified Fri May 01 auth: no python
Fast hierarchical clustering routines for R and Python. Provides efficient implementations of hierarchical clustering (e.g., single, complete, average linkage) with memory-saving algorithms. Current version 1.3.0, with an irregular release cadence (last release 2022).
pip install fastcluster Common errors
error ModuleNotFoundError: No module named 'fastcluster' ↓
cause Package not installed in the current Python environment.
fix
Run 'pip install fastcluster' to install the package.
error ValueError: The condensed distance matrix must be of length (n*(n-1)//2) for n points, but got ... ↓
cause Input to linkage() is a square distance matrix, not a condensed one.
fix
Convert square matrix to condensed form: from scipy.spatial.distance import squareform; condensed = squareform(square_matrix).
error TypeError: 'numpy.float64' object cannot be interpreted as an integer ↓
cause Sometimes occurs when using method='centroid' or 'median' with certain data types or versions of numpy.
fix
Ensure input array is contiguous and dtype is float64. Try X = np.ascontiguousarray(X, dtype=np.float64).
Warnings
gotcha The 'method' argument uses lowercase names (e.g., 'single', 'complete', 'average') — not 'ward' without the 'ward_D2' distinction that scipy uses. fastcluster's 'ward' is equivalent to scipy's 'ward' (i.e., ward's method on Euclidean distances only, not on precomputed distances). ↓
fix Use method='ward' for Ward linkage. If you need weighted or centroid methods, note they may differ from scipy.
gotcha Input to linkage() expects a 2D array of observations (n_samples, n_features) or a condensed distance matrix. If you pass a square distance matrix, fastcluster will interpret it as observations, leading to silent incorrect results. ↓
fix Use scipy.spatial.distance.squareform to convert square matrix to condensed form before passing to linkage().
deprecated The 'fastcluster.linkage' function with 'method'='centroid' or 'median' returns results that differ from scipy.cluster.hierarchy in terms of how the distance matrix is updated. These methods are considered deprecated in favor of scipy's implementations. ↓
fix Use scipy.cluster.hierarchy.linkage with method='centroid' or 'median' if you need exact scipy compatibility.
gotcha fastcluster does not support precomputed distance matrices with the 'ward' method — it always recomputes Euclidean distances internally. This can cause unexpected memory usage or errors if you pass a distance matrix expecting Ward's method. ↓
fix If you must use precomputed distances with Ward linkage, use scipy.cluster.hierarchy.linkage with method='ward' after ensuring distances are Euclidean squared.
Imports
- fastcluster.linkage wrong
import fastcluster; fastcluster.linkage()correctfrom fastcluster import linkage - linkage_vector
from fastcluster import linkage_vector - single
from fastcluster import single
Quickstart
import numpy as np
from fastcluster import linkage
# Generate random data: 100 points in 3D
X = np.random.rand(100, 3)
# Perform hierarchical clustering with average linkage
Z = linkage(X, method='average')
print(Z.shape) # (99, 4)