K-Means Constrained
K-Means Constrained is a Python library that implements K-Means clustering with user-defined minimum and maximum cluster size constraints. It's based on the constrained k-means algorithm by Bradley, Bennett, & Demiriz (2000). The current version is 0.9.0, and the project maintains an active but moderate release cadence, typically releasing updates a few times a year.
Common errors
-
ModuleNotFoundError: No module named 'k_means_constrained'
cause The `k-means-constrained` package is not installed in your Python environment or the import path is incorrect.fixInstall the package using `pip install k-means-constrained`. Ensure your import statement is `from k_means_constrained import KMeansConstrained`. -
ValueError: Not enough points to satisfy cluster constraints.
cause The total number of samples (`n_samples`) provided to `fit` is incompatible with the specified `n_clusters`, `size_min`, and `size_max` parameters.fixVerify that `n_clusters * size_min <= n_samples <= n_clusters * size_max`. Adjust the number of clusters, min/max sizes, or provide more data points to satisfy the constraints. -
TypeError: size_min must be an integer
cause The `size_min` or `size_max` parameter was provided as a float or other non-integer type.fixEnsure that both `size_min` and `size_max` are integer values when initializing `KMeansConstrained`.
Warnings
- gotcha Computational complexity increases significantly with large datasets, many clusters, or very tight cluster size constraints. The constrained K-Means problem is NP-hard.
- gotcha Results are non-reproducible without setting `random_state`. The initialization of cluster centers and subsequent iterative steps can involve randomness.
- gotcha Incompatible cluster constraints (`n_clusters`, `size_min`, `size_max`) can lead to a `ValueError` or an unsolvable problem, as the algorithm cannot partition the data as requested.
Install
-
pip install k-means-constrained
Imports
- KMeansConstrained
from k_means_constrained import KMeansConstrained
Quickstart
import numpy as np
from k_means_constrained import KMeansConstrained
# Sample data
X = np.array([
[1, 2], [1.1, 2.1], [0.9, 1.9],
[10, 11], [10.1, 11.1], [9.9, 10.9],
[5, 5], [5.1, 5.1], [4.9, 4.9],
[20, 21], [20.1, 21.1]
])
# Initialize and fit the constrained K-Means model
# n_clusters=3, min_size=2, max_size=4
clf = KMeansConstrained(
n_clusters=3,
size_min=2,
size_max=4,
random_state=0
)
clf.fit(X)
# Print cluster assignments and cluster centers
print("Labels:", clf.labels_)
print("Cluster Centers:\n", clf.cluster_centers_)