Pingouin
Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. It provides a comprehensive yet user-friendly set of functions for various statistical tests, including ANOVAs, correlations, regressions, Bayes Factors, effect sizes, and reliability analysis. The current stable version is 0.6.1, and the library maintains a frequent release cadence with ongoing development.
Warnings
- breaking The `plot_shift` function was removed in Pingouin 0.6.0. Any code relying on this function will break.
- breaking The minimum required SciPy version for `compute_bootci` was bumped to 1.10.0 in Pingouin 0.6.0. Ensure your SciPy installation meets this requirement to avoid `ImportError` or unexpected behavior.
- deprecated The `pingouin.gzscore()` function is deprecated and will be removed in a future release. It is recommended to use `scipy.stats.gzscore()` instead for robust z-score calculation.
- gotcha Pingouin functions, especially those involving paired measurements (e.g., paired T-test, correlation, repeated measures ANOVA), automatically perform listwise deletion of missing values. This means entire rows with any missing data are removed, which can be drastic for datasets with many missing values.
- gotcha The `pingouin.rm_anova` function had an issue in earlier versions (pre-0.6.0, specifically around March 2022 releases) where eta-squared (n2) effect size was incorrectly calculated and was identical to partial eta-squared. Users should double-check any effect sizes previously obtained with `rm_anova` from affected versions.
- gotcha The `mediation_analysis` function currently only supports continuous outcome variables and does not work with binary or ordinal outcomes. Additionally, the p-value for the indirect effect should be interpreted with caution as it's computed using a bootstrap distribution and not strictly conditioned on a true null hypothesis.
Install
-
pip install pingouin -
conda install -c conda-forge pingouin
Imports
- pingouin
import pingouin as pg
- ttest
from pingouin import ttest
Quickstart
import pingouin as pg
import numpy as np
import pandas as pd
# Simulate two independent groups of data
np.random.seed(123)
data_group1 = np.random.normal(loc=10, scale=2, size=30)
data_group2 = np.random.normal(loc=12, scale=2.5, size=30)
# Perform an independent samples t-test
result = pg.ttest(data_group1, data_group2, correction='auto')
print(result)
# Example with a DataFrame for ANOVA
df_anova = pd.DataFrame({
'dv': [10, 12, 11, 13, 15, 14, 16, 18, 17, 19, 20, 22],
'group': ['A']*4 + ['B']*4 + ['C']*4
})
aov_result = pg.anova(data=df_anova, dv='dv', between='group')
print("\nANOVA Result:")
print(aov_result)