PyStan
PyStan is a Python interface to Stan, a powerful platform for Bayesian inference and high-performance statistical computation. It allows users to define statistical models using Stan's probabilistic programming language and fit them using Hamiltonian Monte Carlo (HMC) methods. Currently at version 3.10.1, PyStan focuses on providing a reliable HMC sampler, with a development cadence that sees frequent updates to minor versions.
Warnings
- breaking PyStan 3 is a complete rewrite and introduces numerous backwards-incompatible changes from PyStan 2.x. Key API changes include `pystan.StanModel()` becoming `stan.build()`, `fit.sampling()` becoming `fit.sample()`, and `iter` parameter changing to `num_samples`.
- breaking Microsoft Windows is no longer officially supported for PyStan 3.x installations. Users on Windows are advised to use Windows Subsystem for Linux 2 (WSL2) to run PyStan, or consider alternative Stan interfaces like CmdStanPy.
- breaking In PyStan 3.x, data and `random_seed` must be passed during the `stan.build()` step (compile time), not during the `fit.sample()` step (sampling time), unlike PyStan 2.x.
- breaking PyStan 3.x significantly reduces its scope. Variational inference, maximization algorithms (e.g., LBFGS), and samplers other than the default HMC sampler are no longer supported. The `stansummary` display and `check_hmc_diagnostics` functions have also been removed.
- gotcha PyStan requires a C++ compiler to be available both during installation and at runtime for compiling Stan models. Lack of a properly configured compiler is a common cause of installation and runtime errors.
- gotcha When running PyStan 3.x in Jupyter notebooks or other environments that manage their own `asyncio` event loops, `RuntimeError: asyncio.run() cannot be called from a running event loop` may occur.
Install
-
pip install pystan -
conda install pystan -c conda-forge
Imports
- stan
import stan
Quickstart
import stan
import numpy as np
schools_code = """
data {
int<lower=0> J; // number of schools
array[J] real y; // estimated treatment effects
array[J] real<lower=0> sigma; // standard error of effect estimates
}
parameters {
real mu; // population treatment effect
real<lower=0> tau; // standard deviation in treatment effects
vector[J] eta; // unscaled deviation from mu by school
}
transformed parameters {
vector[J] theta = mu + tau * eta; // school treatment effects
}
model {
target += normal_lpdf(eta | 0, 1); // prior log-density
target += normal_lpdf(y | theta, sigma); // log-likelihood
}
"""
schools_data = {
"J": 8,
"y": [28, 8, -3, 7, -1, 1, 18, 12],
"sigma": [15, 10, 16, 11, 9, 11, 10, 18],
}
# Build the model (compiles Stan code to C++ and then to executable)
posterior = stan.build(schools_code, data=schools_data, random_seed=1)
# Sample from the posterior distribution
fit = posterior.sample(num_chains=4, num_samples=1000)
# Extract samples for a parameter
mu_samples = fit["mu"]
print(f"Mean of mu samples: {np.mean(mu_samples)}")
# To get a pandas DataFrame (requires pandas installed):
# import pandas as pd
# df = fit.to_frame()
# print(df.head())