psmpy - Propensity Score Matching for Python
raw JSON → 0.3.16 verified Sat May 09 auth: no python
psmpy provides propensity score matching for observational studies, including functions to compute propensity scores, perform matching (with/without replacement, caliper), and generate graphical plots (balancing, histogram). Version 0.3.16 is the latest release. Release cadence is low; last update was 2024.
pip install psmpy Common errors
error TypeError: __init__() got an unexpected keyword argument 'idcol' ↓
cause Parameter 'idcol' renamed to 'indx' in version 0.3.16.
fix
Use 'indx' parameter instead of 'idcol'.
error AttributeError: module 'psmpy' has no attribute 'Psmpy' ↓
cause Wrong import statement; psmpy is a package, not a module.
fix
Use 'from psmpy import Psmpy'.
Warnings
breaking In version 0.3.15 and earlier, the 'indx' parameter was called 'idcol'. It was renamed in 0.3.16. Using 'idcol' raises TypeError. ↓
fix Use 'indx' instead of 'idcol' when initializing Psmpy.
gotcha The 'logistic_model' parameter expects a scikit-learn LogisticRegression instance with solver='liblinear' (or similar) that supports predict_proba. Using default solver may raise warnings or errors on small datasets. ↓
fix Explicitly instantiate LogisticRegression(solver='liblinear').
gotcha The 'exclude_cols' parameter should include any columns not used for propensity score estimation (e.g., outcome variables, IDs). Forgetting to exclude the treatment column can cause errors. ↓
fix List all columns that are not predictors in 'exclude_cols'.
Imports
- Psmpy wrong
import psmpycorrectfrom psmpy import Psmpy - Psmpy wrong
from psmpy.psmpy import Psmpycorrectfrom psmpy import Psmpy
Quickstart
import pandas as pd
from psmpy import Psmpy
from sklearn.linear_model import LogisticRegression
# Sample data
df = pd.DataFrame({
'treatment': [0, 1, 0, 1, 0, 1],
'age': [30, 40, 35, 45, 25, 50],
'income': [50000, 60000, 55000, 65000, 45000, 70000]
})
# Initialize Psmpy
psm = Psmpy(data=df, treatment='treatment', indx='age',
exclude_cols=['income'],
logistic_model=LogisticRegression(solver='liblinear'))
# Compute propensity scores
psm.pscore()
# Perform matching
psm.match(method='nearest', caliper=None, replace=False)
# View matched pairs
print(psm.matched_ids)