{"id":9872,"library":"kmeans-pytorch","title":"KMeans-PyTorch","description":"K-means-pytorch provides a K-means clustering algorithm implementation built on top of PyTorch, enabling GPU acceleration for faster computations. The current version is 0.3, with releases occurring infrequently, often driven by new feature additions or argument clarifications rather than a fixed schedule.","status":"active","version":"0.3","language":"en","source_language":"en","source_url":"https://github.com/subhadarship/kmeans_pytorch","tags":["machine-learning","clustering","pytorch","gpu"],"install":[{"cmd":"pip install kmeans-pytorch","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Core dependency for tensor operations and GPU acceleration.","package":"torch","optional":false}],"imports":[{"note":"The `kmeans` function is directly exposed at the top level of the `kmeans_pytorch` package, not nested under a module namespace within it.","wrong":"import kmeans_pytorch; kmeans_pytorch.kmeans","symbol":"kmeans","correct":"from kmeans_pytorch import kmeans"},{"note":"Used for predicting cluster IDs for new data based on existing cluster centers.","symbol":"kmeans_predict","correct":"from kmeans_pytorch import kmeans_predict"}],"quickstart":{"code":"import torch\nfrom kmeans_pytorch import kmeans\n\n# 0. Generate some random data\nnum_samples = 1000\nnum_features = 2\nX = torch.randn(num_samples, num_features, device='cpu', dtype=torch.float)\n\n# Add some clusters\nX[:300] += 5\nX[300:600] -= 5\nX[600:] += torch.tensor([0, 10], dtype=torch.float)\n\nnum_clusters = 3\ntolerance = 1e-4\nmax_iterations = 500\ndistance_metric = 'euclidean'\ndevice = 'cpu' # Change to 'cuda:0' if a GPU is available\n\n# 1. Run K-means\ncluster_ids_x, cluster_centers = kmeans(\n    X=X,\n    num_clusters=num_clusters,\n    distance=distance_metric,\n    tol=tolerance,\n    max_iter=max_iterations,\n    device=device\n)\n\nprint(f\"Cluster IDs shape: {cluster_ids_x.shape}\")\nprint(f\"Cluster Centers shape: {cluster_centers.shape}\")\nprint(f\"First 5 cluster IDs: {cluster_ids_x[:5]}\")\nprint(f\"Cluster centers:\\n{cluster_centers}\")","lang":"python","description":"This example demonstrates how to generate sample data, run the `kmeans` algorithm, and retrieve the cluster assignments and final cluster centers. Remember to adjust the `device` parameter ('cpu' or 'cuda:0') based on your hardware."},"warnings":[{"fix":"Pass `device='cuda:0'` to the `kmeans` function to leverage GPU acceleration. Ensure your input tensor `X` is also on the correct device (e.g., `X = X.to('cuda:0')`).","message":"Performance on large datasets will be significantly impacted if you forget to specify `device='cuda:0'` when a GPU is available. The default device is CPU, which is much slower for heavy computations.","severity":"gotcha","affected_versions":"0.2+"},{"fix":"Ensure your input tensor `X` has the same `dtype` (e.g., `torch.float` or `torch.double`) as expected by PyTorch operations within the library. Explicitly cast `X` if necessary: `X = X.to(dtype=torch.float)`.","message":"Mismatched data types (e.g., `torch.float` vs `torch.double`) between the input tensor `X` and internally generated tensors can cause `RuntimeError: Input type (Float) and weight type (Double) should be the same`.","severity":"gotcha","affected_versions":"0.2+"},{"fix":"Always explicitly pass all desired arguments (e.g., `num_clusters`, `distance`, `tol`, `max_iter`, `device`) rather than relying on assumed defaults to ensure consistent behavior across updates.","message":"While not explicitly documented as breaking changes, minor releases might introduce or clarify argument names, types, and default values for the `kmeans` function. For instance, the exact default values for `tol`, `max_iter`, or the `distance` metric might subtly change.","severity":"breaking","affected_versions":"Prior to 0.3, possibly minor revisions within 0.3."}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"Install the package using pip: `pip install kmeans-pytorch`","cause":"The `kmeans-pytorch` package is not installed in the current Python environment.","error":"ModuleNotFoundError: No module named 'kmeans_pytorch'"},{"fix":"Use `from kmeans_pytorch import kmeans` to directly import the function.","cause":"Attempting to access `kmeans` as a submodule of `kmeans_pytorch` instead of directly importing it.","error":"AttributeError: module 'kmeans_pytorch' has no attribute 'kmeans'"},{"fix":"Ensure your input tensor `X` is cast to the expected data type, e.g., `X = X.to(dtype=torch.float)` or `X = X.to(dtype=torch.double)` consistently.","cause":"The input data tensor `X` has a different data type (e.g., `float`) than what PyTorch or the library expects for internal calculations (e.g., `double`).","error":"RuntimeError: Input type (torch.FloatTensor) and weight type (torch.DoubleTensor) should be the same"},{"fix":"Verify the `device` string is correct and corresponds to an available device. Use `torch.cuda.is_available()` to check for GPU presence. For CPU, use `'cpu'`. For GPU, use `'cuda:0'` for the first GPU.","cause":"The `device` parameter passed to `kmeans` (e.g., 'cuda:0', 'cpu') is either incorrectly formatted or refers to a non-existent device (e.g., 'cuda:0' when no GPU is available, or an invalid GPU index).","error":"RuntimeError: CUDA error: invalid device ordinal"}]}