fastcluster

raw JSON →
1.3.0 verified Fri May 01 auth: no python

Fast hierarchical clustering routines for R and Python. Provides efficient implementations of hierarchical clustering (e.g., single, complete, average linkage) with memory-saving algorithms. Current version 1.3.0, with an irregular release cadence (last release 2022).

pip install fastcluster
error ModuleNotFoundError: No module named 'fastcluster'
cause Package not installed in the current Python environment.
fix
Run 'pip install fastcluster' to install the package.
error ValueError: The condensed distance matrix must be of length (n*(n-1)//2) for n points, but got ...
cause Input to linkage() is a square distance matrix, not a condensed one.
fix
Convert square matrix to condensed form: from scipy.spatial.distance import squareform; condensed = squareform(square_matrix).
error TypeError: 'numpy.float64' object cannot be interpreted as an integer
cause Sometimes occurs when using method='centroid' or 'median' with certain data types or versions of numpy.
fix
Ensure input array is contiguous and dtype is float64. Try X = np.ascontiguousarray(X, dtype=np.float64).
gotcha The 'method' argument uses lowercase names (e.g., 'single', 'complete', 'average') — not 'ward' without the 'ward_D2' distinction that scipy uses. fastcluster's 'ward' is equivalent to scipy's 'ward' (i.e., ward's method on Euclidean distances only, not on precomputed distances).
fix Use method='ward' for Ward linkage. If you need weighted or centroid methods, note they may differ from scipy.
gotcha Input to linkage() expects a 2D array of observations (n_samples, n_features) or a condensed distance matrix. If you pass a square distance matrix, fastcluster will interpret it as observations, leading to silent incorrect results.
fix Use scipy.spatial.distance.squareform to convert square matrix to condensed form before passing to linkage().
deprecated The 'fastcluster.linkage' function with 'method'='centroid' or 'median' returns results that differ from scipy.cluster.hierarchy in terms of how the distance matrix is updated. These methods are considered deprecated in favor of scipy's implementations.
fix Use scipy.cluster.hierarchy.linkage with method='centroid' or 'median' if you need exact scipy compatibility.
gotcha fastcluster does not support precomputed distance matrices with the 'ward' method — it always recomputes Euclidean distances internally. This can cause unexpected memory usage or errors if you pass a distance matrix expecting Ward's method.
fix If you must use precomputed distances with Ward linkage, use scipy.cluster.hierarchy.linkage with method='ward' after ensuring distances are Euclidean squared.

Basic usage: create random data, compute linkage matrix using average method.

import numpy as np
from fastcluster import linkage
# Generate random data: 100 points in 3D
X = np.random.rand(100, 3)
# Perform hierarchical clustering with average linkage
Z = linkage(X, method='average')
print(Z.shape)  # (99, 4)