psmpy - Propensity Score Matching for Python

raw JSON →
0.3.16 verified Sat May 09 auth: no python

psmpy provides propensity score matching for observational studies, including functions to compute propensity scores, perform matching (with/without replacement, caliper), and generate graphical plots (balancing, histogram). Version 0.3.16 is the latest release. Release cadence is low; last update was 2024.

pip install psmpy
error TypeError: __init__() got an unexpected keyword argument 'idcol'
cause Parameter 'idcol' renamed to 'indx' in version 0.3.16.
fix
Use 'indx' parameter instead of 'idcol'.
error AttributeError: module 'psmpy' has no attribute 'Psmpy'
cause Wrong import statement; psmpy is a package, not a module.
fix
Use 'from psmpy import Psmpy'.
breaking In version 0.3.15 and earlier, the 'indx' parameter was called 'idcol'. It was renamed in 0.3.16. Using 'idcol' raises TypeError.
fix Use 'indx' instead of 'idcol' when initializing Psmpy.
gotcha The 'logistic_model' parameter expects a scikit-learn LogisticRegression instance with solver='liblinear' (or similar) that supports predict_proba. Using default solver may raise warnings or errors on small datasets.
fix Explicitly instantiate LogisticRegression(solver='liblinear').
gotcha The 'exclude_cols' parameter should include any columns not used for propensity score estimation (e.g., outcome variables, IDs). Forgetting to exclude the treatment column can cause errors.
fix List all columns that are not predictors in 'exclude_cols'.

Basic workflow: initialize with data, compute scores, match, and inspect matches.

import pandas as pd
from psmpy import Psmpy
from sklearn.linear_model import LogisticRegression

# Sample data
df = pd.DataFrame({
    'treatment': [0, 1, 0, 1, 0, 1],
    'age': [30, 40, 35, 45, 25, 50],
    'income': [50000, 60000, 55000, 65000, 45000, 70000]
})

# Initialize Psmpy
psm = Psmpy(data=df, treatment='treatment', indx='age',
            exclude_cols=['income'],
            logistic_model=LogisticRegression(solver='liblinear'))

# Compute propensity scores
psm.pscore()

# Perform matching
psm.match(method='nearest', caliper=None, replace=False)

# View matched pairs
print(psm.matched_ids)