{"id":6698,"library":"kmeans1d","title":"Optimal 1D k-means clustering","description":"kmeans1d is a Python package providing an implementation of optimal k-means clustering specifically for one-dimensional data. It utilizes an O(kn + n log n) dynamic programming algorithm, based on research by Xiaolin (1991) and Gronlund et al. (2017), to find globally optimal k clusters. The core logic is written in C++ for performance and wrapped for Python usage. The library is actively maintained, with its current version being 0.5.0.","status":"active","version":"0.5.0","language":"en","source_language":"en","source_url":"https://github.com/dstein64/kmeans1d","tags":["kmeans","clustering","1d","optimal","data-science","machine-learning"],"install":[{"cmd":"pip install kmeans1d","lang":"bash","label":"Install with pip"}],"dependencies":[],"imports":[{"note":"The primary clustering function is directly accessible via the top-level 'kmeans1d' module.","symbol":"cluster","correct":"import kmeans1d\nclusters, centroids = kmeans1d.cluster(data, k)"}],"quickstart":{"code":"import kmeans1d\n\nx = [4.0, 4.1, 4.2, -50.0, 200.2, 200.4, 200.9, 80.0, 100.0, 102.0]\nk = 4\n\nclusters, centroids = kmeans1d.cluster(x, k)\n\nprint(f\"Clusters: {clusters}\")\nprint(f\"Centroids: {centroids}\")","lang":"python","description":"This example demonstrates how to perform 1D k-means clustering on a sample dataset `x` with `k=4` clusters. It returns the cluster assignments for each data point and the computed centroids for each cluster."},"warnings":[{"fix":"Ensure your input data `x` is a flat list or array of numerical values representing a single dimension.","message":"This library is designed exclusively for 1-dimensional data. Attempting to use it with multi-dimensional input will result in errors or incorrect behavior, as the underlying algorithm is optimized for the 1D case.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Verify that `k <= len(x)` before calling `kmeans1d.cluster(x, k)`.","message":"The number of clusters `k` must be less than or equal to the number of data points `n`. Providing `k > n` may lead to errors or undefined behavior in the clustering function.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For optimal compatibility across Python versions and simplified builds, ensure you are using `kmeans1d` version `0.4.0` or newer. If building from source for older versions, refer to the release notes for specific compiler or API requirements.","message":"Versions prior to `0.4.0` might have had more specific build requirements for different Python versions or operating systems due to not using the Python Limited API. While `pip install` generally handles this, custom builds or specific environments might encounter issues.","severity":"gotcha","affected_versions":"<0.4.0"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[]}