{"id":5282,"library":"kmodes","title":"kmodes Clustering Library","description":"Python implementations of the k-modes and k-prototypes clustering algorithms for clustering categorical data. It is currently at version 0.12.2 and sees active development with several releases per year.","status":"active","version":"0.12.2","language":"en","source_language":"en","source_url":"https://github.com/nicodv/kmodes","tags":["clustering","categorical data","k-modes","k-prototypes","machine learning","unsupervised learning"],"install":[{"cmd":"pip install kmodes","lang":"bash","label":"Install with pip"}],"dependencies":[{"reason":"Relies on numpy for core array operations and computations.","package":"numpy","optional":false},{"reason":"The code is modeled after scikit-learn's interface and has a minimum version requirement of 0.22 since kmodes 0.11.0.","package":"scikit-learn","optional":false},{"reason":"Used for parallel execution support via the `n_jobs` parameter for multiple initialization runs.","package":"joblib","optional":false}],"imports":[{"symbol":"KModes","correct":"from kmodes.kmodes import KModes"},{"symbol":"KPrototypes","correct":"from kmodes.kprototypes import KPrototypes"}],"quickstart":{"code":"import numpy as np\nfrom kmodes.kmodes import KModes\n\n# Generate random categorical data (e.g., 100 samples, 10 features, 20 unique categories per feature)\ndata = np.random.choice(20, (100, 10))\n\n# Initialize KModes with 4 clusters, Huang initialization, 5 initialization runs\nkm = KModes(n_clusters=4, init='Huang', n_init=5, verbose=1)\n\n# Fit the model and predict clusters\nclusters = km.fit_predict(data)\n\n# Print the cluster centroids\nprint(\"Cluster Centroids:\\n\", km.cluster_centroids_)\nprint(\"Assigned Clusters:\\n\", clusters[:10]) # Display first 10 assigned clusters","lang":"python","description":"Demonstrates basic usage of the KModes algorithm for clustering purely categorical data. Initialize the KModes estimator, fit it to your data, and retrieve the cluster assignments and centroids."},"warnings":[{"fix":"Pre-process your data to handle `np.NaN` values (e.g., fill with a specific category for categorical features, or remove rows) before passing to `kmodes`.","message":"Dropped support for missing values (np.NaN) in the input matrix (X) starting from version 0.11.1, following scikit-learn's approach. Users must now handle missing data manually by imputation or removal.","severity":"breaking","affected_versions":">=0.11.1"},{"fix":"Upgrade your Python environment to at least 3.6. For full compatibility with the latest features, Python 3.10 or newer is recommended.","message":"Python 3.4 support was dropped in version 0.10.2. Official support for Python 3.10 was added in 0.12.0. Ensure your Python environment is compatible (Python 3.6+ is generally safe).","severity":"breaking","affected_versions":">=0.10.2"},{"fix":"Upgrade your `scikit-learn` library to version 0.22 or newer: `pip install --upgrade scikit-learn`.","message":"The minimum `scikit-learn` version was upgraded to 0.22 in kmodes version 0.11.0. Older `scikit-learn` versions may cause compatibility issues or `AttributeError`.","severity":"breaking","affected_versions":">=0.11.0"},{"fix":"Ensure that all numerical columns are consistently typed as numeric (e.g., `float` or `int`) before passing them to `KPrototypes`. Convert string representations of numbers to their proper numeric types.","message":"When using `KPrototypes`, one or more of your numerical feature columns may contain string values, leading to `TypeError: '<' not supported between instances of 'str' and 'float'`.","severity":"gotcha","affected_versions":"All"},{"fix":"Pass a list of categorical column indices to the `categorical` parameter during `KPrototypes` initialization, e.g., `KPrototypes(..., categorical=[0, 2, 5])`.","message":"For `KPrototypes`, you must explicitly specify which column indices are categorical using the `categorical` argument. If not provided, it assumes all are numerical, or can raise an error if mixed data types are present without specification.","severity":"gotcha","affected_versions":"All"},{"fix":"Rename your local Python script to something other than `kmodes.py` (e.g., `my_script.py`) to avoid module name conflicts.","message":"A `ModuleNotFoundError` (e.g., `No module named 'kmodes.kmodes'`) can occur if your working Python file is named `kmodes.py`, as it might shadow the installed `kmodes` package.","severity":"gotcha","affected_versions":"All"},{"fix":"Consider reducing the number of clusters, cleaning or normalizing your data, exploring different initialization methods, or ensuring sufficient data density for the chosen cluster count.","message":"Encountering `ValueError: Clustering algorithm could not initialize` is often an indication that the data and chosen parameters (e.g., `n_clusters`, `init` method) are not suitable. It's not necessarily a bug.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}