KMeans-PyTorch
K-means-pytorch provides a K-means clustering algorithm implementation built on top of PyTorch, enabling GPU acceleration for faster computations. The current version is 0.3, with releases occurring infrequently, often driven by new feature additions or argument clarifications rather than a fixed schedule.
Common errors
-
ModuleNotFoundError: No module named 'kmeans_pytorch'
cause The `kmeans-pytorch` package is not installed in the current Python environment.fixInstall the package using pip: `pip install kmeans-pytorch` -
AttributeError: module 'kmeans_pytorch' has no attribute 'kmeans'
cause Attempting to access `kmeans` as a submodule of `kmeans_pytorch` instead of directly importing it.fixUse `from kmeans_pytorch import kmeans` to directly import the function. -
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.DoubleTensor) should be the same
cause The input data tensor `X` has a different data type (e.g., `float`) than what PyTorch or the library expects for internal calculations (e.g., `double`).fixEnsure your input tensor `X` is cast to the expected data type, e.g., `X = X.to(dtype=torch.float)` or `X = X.to(dtype=torch.double)` consistently. -
RuntimeError: CUDA error: invalid device ordinal
cause The `device` parameter passed to `kmeans` (e.g., 'cuda:0', 'cpu') is either incorrectly formatted or refers to a non-existent device (e.g., 'cuda:0' when no GPU is available, or an invalid GPU index).fixVerify the `device` string is correct and corresponds to an available device. Use `torch.cuda.is_available()` to check for GPU presence. For CPU, use `'cpu'`. For GPU, use `'cuda:0'` for the first GPU.
Warnings
- gotcha Performance on large datasets will be significantly impacted if you forget to specify `device='cuda:0'` when a GPU is available. The default device is CPU, which is much slower for heavy computations.
- gotcha Mismatched data types (e.g., `torch.float` vs `torch.double`) between the input tensor `X` and internally generated tensors can cause `RuntimeError: Input type (Float) and weight type (Double) should be the same`.
- breaking While not explicitly documented as breaking changes, minor releases might introduce or clarify argument names, types, and default values for the `kmeans` function. For instance, the exact default values for `tol`, `max_iter`, or the `distance` metric might subtly change.
Install
-
pip install kmeans-pytorch
Imports
- kmeans
import kmeans_pytorch; kmeans_pytorch.kmeans
from kmeans_pytorch import kmeans
- kmeans_predict
from kmeans_pytorch import kmeans_predict
Quickstart
import torch
from kmeans_pytorch import kmeans
# 0. Generate some random data
num_samples = 1000
num_features = 2
X = torch.randn(num_samples, num_features, device='cpu', dtype=torch.float)
# Add some clusters
X[:300] += 5
X[300:600] -= 5
X[600:] += torch.tensor([0, 10], dtype=torch.float)
num_clusters = 3
tolerance = 1e-4
max_iterations = 500
distance_metric = 'euclidean'
device = 'cpu' # Change to 'cuda:0' if a GPU is available
# 1. Run K-means
cluster_ids_x, cluster_centers = kmeans(
X=X,
num_clusters=num_clusters,
distance=distance_metric,
tol=tolerance,
max_iter=max_iterations,
device=device
)
print(f"Cluster IDs shape: {cluster_ids_x.shape}")
print(f"Cluster Centers shape: {cluster_centers.shape}")
print(f"First 5 cluster IDs: {cluster_ids_x[:5]}")
print(f"Cluster centers:\n{cluster_centers}")