{"id":5794,"library":"pingouin","title":"Pingouin","description":"Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy. It provides a comprehensive yet user-friendly set of functions for various statistical tests, including ANOVAs, correlations, regressions, Bayes Factors, effect sizes, and reliability analysis. The current stable version is 0.6.1, and the library maintains a frequent release cadence with ongoing development.","status":"active","version":"0.6.1","language":"en","source_language":"en","source_url":"https://github.com/raphaelvallat/pingouin","tags":["statistics","data analysis","scientific computing","pandas","numpy","biostatistics","psychology"],"install":[{"cmd":"pip install pingouin","lang":"bash","label":"pip"},{"cmd":"conda install -c conda-forge pingouin","lang":"bash","label":"Conda (Conda-Forge)"}],"dependencies":[{"reason":"Core numerical operations.","package":"numpy","optional":false},{"reason":"Underlying statistical functions.","package":"scipy","optional":false},{"reason":"Data manipulation and DataFrame output for results.","package":"pandas","optional":false},{"reason":"Enhances Pandas integration.","package":"pandas_flavor","optional":false},{"reason":"Advanced statistical modeling.","package":"statsmodels","optional":false},{"reason":"Plotting capabilities.","package":"matplotlib","optional":false},{"reason":"Enhanced data visualization.","package":"seaborn","optional":false},{"reason":"Machine learning utilities, e.g., for regression.","package":"scikit-learn","optional":false},{"reason":"Formatting tabular data.","package":"tabulate","optional":false},{"reason":"Additional functionality for some functions, optional.","package":"mpmath","optional":true}],"imports":[{"note":"Standard import for accessing all Pingouin functions.","symbol":"pingouin","correct":"import pingouin as pg"},{"note":"Import specific functions directly to avoid namespace pollution if only a few functions are needed.","symbol":"ttest","correct":"from pingouin import ttest"}],"quickstart":{"code":"import pingouin as pg\nimport numpy as np\nimport pandas as pd\n\n# Simulate two independent groups of data\nnp.random.seed(123)\ndata_group1 = np.random.normal(loc=10, scale=2, size=30)\ndata_group2 = np.random.normal(loc=12, scale=2.5, size=30)\n\n# Perform an independent samples t-test\nresult = pg.ttest(data_group1, data_group2, correction='auto')\n\nprint(result)\n\n# Example with a DataFrame for ANOVA\ndf_anova = pd.DataFrame({\n    'dv': [10, 12, 11, 13, 15, 14, 16, 18, 17, 19, 20, 22],\n    'group': ['A']*4 + ['B']*4 + ['C']*4\n})\naov_result = pg.anova(data=df_anova, dv='dv', between='group')\nprint(\"\\nANOVA Result:\")\nprint(aov_result)","lang":"python","description":"This quickstart demonstrates performing an independent samples t-test and a one-way ANOVA using Pingouin. It highlights the library's ability to take raw numerical arrays or Pandas DataFrames and return rich statistical output in a DataFrame format, including T-values, p-values, degrees of freedom, effect sizes (e.g., Cohen's d), and power."},"warnings":[{"fix":"Remove calls to `plot_shift` and use alternative plotting libraries like Matplotlib or Seaborn for similar visualizations, or revert to an older version if absolutely necessary.","message":"The `plot_shift` function was removed in Pingouin 0.6.0. Any code relying on this function will break.","severity":"breaking","affected_versions":">=0.6.0"},{"fix":"Upgrade SciPy to at least version 1.10.0: `pip install --upgrade scipy`.","message":"The minimum required SciPy version for `compute_bootci` was bumped to 1.10.0 in Pingouin 0.6.0. Ensure your SciPy installation meets this requirement to avoid `ImportError` or unexpected behavior.","severity":"breaking","affected_versions":">=0.6.0"},{"fix":"Replace `pg.gzscore()` with `scipy.stats.gzscore()`.","message":"The `pingouin.gzscore()` function is deprecated and will be removed in a future release. It is recommended to use `scipy.stats.gzscore()` instead for robust z-score calculation.","severity":"deprecated","affected_versions":">=0.5.0"},{"fix":"Be aware of missing data handling. Consider imputing missing values using Pandas or using statistical models that natively support missing values (e.g., linear mixed-effect models), though the latter are not implemented in Pingouin.","message":"Pingouin functions, especially those involving paired measurements (e.g., paired T-test, correlation, repeated measures ANOVA), automatically perform listwise deletion of missing values. This means entire rows with any missing data are removed, which can be drastic for datasets with many missing values.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade to the latest version of Pingouin (0.6.0+) and re-run analyses if concerned about the accuracy of eta-squared values from older versions.","message":"The `pingouin.rm_anova` function had an issue in earlier versions (pre-0.6.0, specifically around March 2022 releases) where eta-squared (n2) effect size was incorrectly calculated and was identical to partial eta-squared. Users should double-check any effect sizes previously obtained with `rm_anova` from affected versions.","severity":"gotcha","affected_versions":"<0.6.0 (especially pre-March 2022)"},{"fix":"Ensure the outcome variable is continuous. For binary/ordinal outcomes or more advanced mediation models, consider alternative R packages like `lavaan` or `mediation`, or the PROCESS macro for SPSS.","message":"The `mediation_analysis` function currently only supports continuous outcome variables and does not work with binary or ordinal outcomes. Additionally, the p-value for the indirect effect should be interpreted with caution as it's computed using a bootstrap distribution and not strictly conditioned on a true null hypothesis.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-14T00:00:00.000Z","next_check":"2026-07-13T00:00:00.000Z"}