KNN Impute
raw JSON → 0.1.0 verified Fri May 01 auth: no python maintenance
A lightweight Python library for k-Nearest Neighbor imputation of missing values in datasets. Current version 0.1.0 appears to be an initial release with minimal updates; last commit on GitHub was in 2018. The library is in maintenance mode.
pip install knnimpute Common errors
error ModuleNotFoundError: No module named 'knn_impute' ↓
cause Incorrect import statement; the package is knnimpute, not knn_impute.
fix
Install with 'pip install knnimpute' and import with 'from knnimpute import knn_impute'.
error ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). ↓
cause The input matrix still contains np.nan after imputation? Or the data has infinite values, which are not handled.
fix
Check that the matrix does not contain inf values. Replace inf with np.nan before imputation: X[np.isinf(X)] = np.nan
error TypeError: 'numpy.float64' object cannot be interpreted as an integer ↓
cause The k parameter might have been passed as a float (e.g., k=3.0).
fix
Ensure k is an integer: k=int(k) or pass k=3 (not 3.0).
Warnings
breaking The function knn_impute_few_observed is not a drop-in replacement for knn_impute; it uses a different algorithm and signature. Ensure you read its documentation. ↓
fix Use knn_impute for standard tasks; only use knn_impute_few_observed when you have many features with few observed values.
gotcha The library does not handle non-numeric data. All columns must be numeric; categorical data must be encoded beforehand. ↓
fix Convert categorical variables to numeric using one-hot encoding or label encoding before calling knn_impute.
gotcha Missing values must be represented as np.nan. Using None or other sentinels will cause incorrect behavior or errors. ↓
fix Ensure missing entries are np.nan. Use np.isnan() to check or convert None to np.nan.
Imports
- knn_impute wrong
from knn_impute import knn_imputecorrectfrom knnimpute import knn_impute - knn_impute_few_observed
from knnimpute import knn_impute_few_observed
Quickstart
import numpy as np
from knnimpute import knn_impute
# Simulate data with missing values
X = np.array([[1, 2, np.nan], [4, np.nan, 6], [7, 8, 9]])
# Impute using k=3
X_imputed = knn_impute(X, k=3)
print(X_imputed)
# Output: [[1. 2. 5.] [4. 5. 6.] [7. 8. 9.]]