PyGrinder
raw JSON → 0.7 verified Mon Apr 27 auth: no python
A Python toolkit for introducing missing values into datasets under various missingness mechanisms (MCAR, MAR, MNAR) and patterns (point, sequential, block). Current version: 0.7. Requires Python >=3.8. Released quarterly.
pip install pygrinder Common errors
error AttributeError: module 'pygrinder' has no attribute 'calc_misssing_rate' ↓
cause Typo in function name; the correct name is calc_missing_rate (single 's' after 'miss').
fix
Use from pygrinder import calc_missing_rate
error TypeError: mcar() got multiple values for argument 'rate' ↓
cause Passing both positional and keyword argument for 'rate', or duplicate keyword.
fix
Call mcar(X, rate=0.2) with only one rate specification.
error ValueError: seq_len must be less than n_steps ↓
cause In seq_missing, seq_len equals or exceeds the number of time steps, causing an empty index list.
fix
Set seq_len < n_steps, or upgrade to pygrinder >=0.6.4 where this condition is handled.
Warnings
breaking In v0.4, all missingness-creating functions (mcar, mar_logistic, mnar_x, mnar_t) changed to return only the corrupted data (with NaN), not a tuple (X_intact, X, mask). Use fill_and_get_mask to get the mask. ↓
fix Update code: data = mcar(X, rate=0.2) instead of X_intact, data, mask = mcar(X, rate=0.2). Use fill_and_get_mask(data) to get mask.
deprecated In v0.7, mnar_num has been renamed to mnar_nonuniform. The old name mnar_num is removed. ↓
fix Replace mnar_num with mnar_nonuniform.
gotcha Functions seq_missing and block_missing require seq_len argument; setting seq_len equal to n_steps can cause an empty step_idx list bug (fixed in v0.6.4). Verify that seq_len < n_steps to avoid errors. ↓
fix Upgrade to >=0.6.4 or ensure seq_len < n_steps.
gotcha Mar_logistic had a bug in argument order (v0.6.2 and earlier) that could produce incorrect missingness. Fixed in v0.6.3. ↓
fix Upgrade to >=0.6.3.
deprecated In v0.4, the parameter 'return_masks' was available to return masks alongside corrupted data; it was removed in later versions. Use fill_and_get_mask instead. ↓
fix Switch to fill_and_get_mask to retrieve the mask.
Imports
- mcar
from pygrinder import mcar - mar_logistic
from pygrinder import mar_logistic - mnar_x
from pygrinder import mnar_x - mnar_t
from pygrinder import mnar_t - mnar_nonuniform
from pygrinder import mnar_nonuniform - seq_missing
from pygrinder import seq_missing - block_missing
from pygrinder import block_missing - rdo
from pygrinder import rdo - fill_and_get_mask
from pygrinder import fill_and_get_mask - little_mcar_test
from pygrinder import little_mcar_test - calc_missing_rate wrong
from pygrinder import calc_misssing_ratecorrectfrom pygrinder import calc_missing_rate
Quickstart
import numpy as np
from pygrinder import mcar, fill_and_get_mask
X = np.random.randn(100, 10)
corrupted_X = mcar(X, rate=0.2)
X_filled, mask = fill_and_get_mask(corrupted_X, fill_value=0)
print(f"Original shape: {X.shape}, missing rate: {1 - mask.mean():.2f}")