Coarsened Exact Matching (CEM)
raw JSON → 1.1.0 verified Sat May 09 auth: no python
Coarsened Exact Matching (CEM) is a Python library for causal inference via matching. It implements the CEM algorithm that coarsens variables to exact matching, reducing model dependence. Current version is 1.1.0, with a stable release cadence (last release Dec 2024). Requires Python 3.9-3.12.
pip install cem Common errors
error AttributeError: module 'cem' has no attribute 'cem' ↓
cause Trying to call `cem.cem()` after `import cem` but the function is `cem.match()` for the main matching routine.
fix
Use
cem.match() instead of cem.cem(). error ValueError: The number of groups must be >1 ↓
cause The treatment variable has only one unique value (all treated or all control).
fix
Ensure your treatment column contains both 0 and 1 (or more groups).
error TypeError: match() got an unexpected keyword argument 'coarsen' ↓
cause Using an older version of cem. The `coarsen` parameter was added in version 1.0.0.
fix
Upgrade to cem >= 1.0.0 or use
cutpoints directly. Warnings
gotcha The `cem.match()` function modifies the input DataFrame in-place? No, but the returned dict contains a copy of matched data. Be careful not to rely on side effects. ↓
fix Always capture the returned dict, e.g., `result = cem.match(...)`
deprecated The old top-level import `from cem import cem` is deprecated in favor of `import cem` and calling `cem.match()`. The old import still works but may be removed in a future version. ↓
fix Use `import cem` and `cem.match()`
gotcha Missing values in treatment or matching variables cause silent dropping of rows. The library does not raise an error; it prints a warning but continues. ↓
fix Ensure no NaN values in columns used for treatment or coarsening.
Imports
- cem wrong
from cem import CEMcorrectimport cem
Quickstart
import cem
import pandas as pd
# Sample data (replace with your own)
df = pd.DataFrame({
'treat': [1,1,0,0,1,0],
'x1': [5.1, 4.9, 5.0, 6.2, 5.5, 5.8],
'x2': [3.5, 3.0, 3.6, 2.9, 3.7, 3.2]
})
# Perform CEM (coarsen x1 and x2 automatically)
result = cem.match(data=df, treatment='treat', coarsen=['x1', 'x2'])
print(result['matched_data'].head())