Optimal 1D k-means clustering
kmeans1d is a Python package providing an implementation of optimal k-means clustering specifically for one-dimensional data. It utilizes an O(kn + n log n) dynamic programming algorithm, based on research by Xiaolin (1991) and Gronlund et al. (2017), to find globally optimal k clusters. The core logic is written in C++ for performance and wrapped for Python usage. The library is actively maintained, with its current version being 0.5.0.
Warnings
- gotcha This library is designed exclusively for 1-dimensional data. Attempting to use it with multi-dimensional input will result in errors or incorrect behavior, as the underlying algorithm is optimized for the 1D case.
- gotcha The number of clusters `k` must be less than or equal to the number of data points `n`. Providing `k > n` may lead to errors or undefined behavior in the clustering function.
- gotcha Versions prior to `0.4.0` might have had more specific build requirements for different Python versions or operating systems due to not using the Python Limited API. While `pip install` generally handles this, custom builds or specific environments might encounter issues.
Install
-
pip install kmeans1d
Imports
- cluster
import kmeans1d clusters, centroids = kmeans1d.cluster(data, k)
Quickstart
import kmeans1d
x = [4.0, 4.1, 4.2, -50.0, 200.2, 200.4, 200.9, 80.0, 100.0, 102.0]
k = 4
clusters, centroids = kmeans1d.cluster(x, k)
print(f"Clusters: {clusters}")
print(f"Centroids: {centroids}")