{"id":7410,"library":"missingpy","title":"MissingPy: Missing Data Imputation","description":"MissingPy is a Python library providing tools for missing data imputation, offering an API consistent with scikit-learn. It primarily supports k-Nearest Neighbors (KNNImputer) and Random Forest-based (MissForest) imputation algorithms. The current version is 0.2.0, released in December 2018. Due to infrequent updates since its last release and limited recent activity on its GitHub repository, the library is considered to be in a maintenance state, with no active development or new releases anticipated.","status":"maintenance","version":"0.2.0","language":"en","source_language":"en","source_url":"https://github.com/epsilon-machine/missingpy","tags":["data imputation","missing data","machine learning","scikit-learn-compatible"],"install":[{"cmd":"pip install missingpy","lang":"bash","label":"Install latest PyPI version"}],"dependencies":[{"reason":"Fundamental array operations and data structures.","package":"numpy","optional":false},{"reason":"Scientific computing and advanced mathematical operations.","package":"scipy","optional":false},{"reason":"API compatibility and underlying machine learning utilities.","package":"scikit-learn","optional":false},{"reason":"Data manipulation, especially for handling DataFrames.","package":"pandas","optional":false}],"imports":[{"symbol":"KNNImputer","correct":"from missingpy import KNNImputer"},{"note":"The direct import from 'missingpy' is standard.","wrong":"from missingpy.missforest import MissForest","symbol":"MissForest","correct":"from missingpy import MissForest"},{"note":"Necessary workaround for `ImportError` with newer `scikit-learn` versions due to internal module changes. This must be placed before `from missingpy import MissForest`.","symbol":"MissForest (with sklearn workaround)","correct":"import sklearn.neighbors._base\nimport sys\nsys.modules['sklearn.neighbors.base'] = sklearn.neighbors._base\nfrom missingpy import MissForest"}],"quickstart":{"code":"import numpy as np\n# Workaround for scikit-learn compatibility\nimport sklearn.neighbors._base\nimport sys\nsys.modules['sklearn.neighbors.base'] = sklearn.neighbors._base\n\nfrom missingpy import MissForest\n\nnan = np.nan\nX = np.array([\n    [1, 2, nan],\n    [3, 4, 3],\n    [nan, 6, 5],\n    [8, 8, 7]\n])\n\nimputer = MissForest(random_state=42)\nX_imputed = imputer.fit_transform(X)\n\nprint(\"Original Data with NaNs:\\n\", X)\nprint(\"Imputed Data:\\n\", X_imputed)","lang":"python","description":"This quickstart demonstrates how to use `MissForest` to impute missing values (represented by `np.nan`) in a NumPy array. It includes the necessary workaround for `scikit-learn` compatibility that is commonly required. Ensure categorical variables are one-hot encoded before passing them to the imputer."},"warnings":[{"fix":"Use a compatibility workaround by aliasing `sys.modules['sklearn.neighbors.base']` before importing `missingpy` classes. Additionally, pinning `scikit-learn` to an older, compatible version (e.g., `scikit-learn==1.1.2` or lower) and `scipy==1.9.1` is often required.","message":"MissingPy has severe compatibility issues with recent versions of `scikit-learn` (e.g., >=1.0) due to reliance on internal `sklearn.neighbors` modules that have been reorganized or removed. This often leads to `ImportError`.","severity":"breaking","affected_versions":"missingpy==0.2.0 with scikit-learn >= 1.0"},{"fix":"Manually one-hot encode categorical columns using `pandas.get_dummies` or `sklearn.preprocessing.OneHotEncoder` before imputation.","message":"MissingPy's `MissForest` algorithm expects numerical input. If your dataset contains categorical variables, they must be explicitly one-hot encoded (dummy encoded) before passing them to the imputer, otherwise, it will raise an error like 'could not convert string to float'.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Be aware of potential future incompatibilities with newer Python ecosystem libraries. Consider alternative, actively maintained imputation libraries for long-term projects.","message":"The `missingpy` library is no longer actively maintained. The last PyPI release was in December 2018, and the GitHub repository shows minimal activity since. This means there will likely be no official updates for newer Python versions, `scikit-learn` compatibility, or bug fixes.","severity":"gotcha","affected_versions":"All versions (due to lack of future updates)"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Run `pip install missingpy` to install the library. If using a virtual environment or conda, ensure it's activated before installation.","cause":"The `missingpy` package is not installed in the active Python environment.","error":"ModuleNotFoundError: No module named 'missingpy'"},{"fix":"Implement the `scikit-learn` compatibility workaround: add `import sklearn.neighbors._base; import sys; sys.modules['sklearn.neighbors.base'] = sklearn.neighbors._base` before any `missingpy` imports. Additionally, consider pinning `scikit-learn` to a compatible older version (e.g., `scikit-learn==1.1.2`) and `scipy` (e.g., `scipy==1.9.1`).","cause":"This error occurs when `missingpy` attempts to import internal modules from `scikit-learn` that have changed or been removed in newer `scikit-learn` versions (typically >= 1.0).","error":"ImportError: cannot import name '_check_weights' from 'sklearn.neighbors._base'"},{"fix":"One-hot encode or label encode your categorical features into numerical representations before passing the data to `missingpy` imputers. For example, use `pandas.get_dummies()` or `sklearn.preprocessing.OneHotEncoder`.","cause":"You are attempting to use a `missingpy` imputer (like `MissForest`) on a DataFrame or array that contains non-numerical (e.g., string or object) categorical columns without prior encoding.","error":"'could not convert string to float: 'CategoryName'"}]}