statsmodels

0.14.6 verified Tue May 12 auth: no python install: verified

statsmodels is a Python package offering a wide array of statistical models, hypothesis tests, and statistical data exploration tools. It provides classes and functions for the estimation of many different statistical models, including linear regression, generalized linear models, discrete choice models, and time series analysis. Currently at version 0.14.6, the library follows a loose, long time-based release cycle for its dependencies, typically updating minimal versions every one and a half to two years. [2, 3, 5, 7]

pip install statsmodels

Common errors

error ModuleNotFoundError: No module named 'statsmodels' ↓

cause The 'statsmodels' package is not installed in the current Python environment.

fix

pip install statsmodels

error ValueError: endog and exog are of different lengths ↓

cause The dependent variable (endog) and independent variables (exog) arrays or series have a different number of observations.

fix

Ensure that the endog and exog arrays/Series passed to the model constructor have the exact same number of rows/observations, often by aligning their indices or handling missing values consistently.

error ValueError: Perfect multicollinearity detected. ↓

cause Occurs when one independent variable can be perfectly predicted from a linear combination of other independent variables, leading to an ill-conditioned design matrix.

fix

Identify and remove redundant independent variables from your model (e.g., duplicate columns, a constant column when an intercept is automatically added, or dummy variable trap).

error ConvergenceWarning: Maximum number of iterations has been reached. ↓

cause The iterative optimization algorithm used by the model (e.g., GLM, discrete choice models) failed to converge to a solution within the specified maximum number of iterations.

fix

Increase the maximum number of iterations (e.g., model.fit(maxiter=1000)), check for perfect multicollinearity, or try different optimization methods if available for the specific model.

Warnings

breaking The `scikits` namespace was deprecated and eventually removed in versions prior to 0.5.0. Direct imports from `scikits.statsmodels` are no longer valid. ↓

fix Always use `import statsmodels.api as sm` or direct imports from `statsmodels.<submodule>` (e.g., `statsmodels.regression.linear_model`). [29, 31]

breaking The signature of `model.predict` methods changed in versions prior to 0.5.0. It now explicitly requires the `params` argument (e.g., `model.predict(params, exog)`), rather than assuming the model has already been fit and omitting `params`. ↓

fix Ensure `model.predict` calls explicitly pass the `params` argument from the fitted model, e.g., `results.predict(exog)` or `model.predict(results.params, exog)`. [29, 31]

deprecated The `statsmodels.tsa.arima_model.ARMA` and `statsmodels.tsa.arima_model.ARIMA` classes have been deprecated. Using them will raise a `FutureWarning`. ↓

fix Migrate to `statsmodels.tsa.arima.model.ARIMA`. The new API provides more consistent handling and features. [34]

gotcha When using the direct `statsmodels.api.OLS(y, X)` interface (without formulas), an intercept term (constant) is NOT automatically added to the `X` (exog) design matrix. This differs from some other statistical software and can lead to incorrect models if an intercept is expected. ↓

fix Explicitly add a constant term using `X = sm.add_constant(X)` from `statsmodels.api` before fitting the model, or use the `statsmodels.formula.api` interface which handles intercepts automatically. [19, 33]

breaking Pandas' `Panel` object and `pandas.stats.ols` (among others) were deprecated and removed in Pandas 0.20.1 and later. Users relying on these for panel data or OLS directly from Pandas will need to switch. ↓

fix For OLS functionality, `statsmodels.api.OLS` is the recommended replacement. For panel data, Pandas recommends using a `MultiIndex` DataFrame or `xarray`, which can then be used with `statsmodels` models where appropriate (e.g., `MixedLM` for some panel-like structures). [37]

breaking Statsmodels 0.14.2 introduced compatibility with NumPy 2.0.0. While `statsmodels` itself may run on older NumPy versions, if you upgrade to NumPy 2.0, all other Python scientific stack dependencies (like SciPy and Pandas) *must also be NumPy 2.0 compatible* to avoid runtime issues. This release also increased the minimum Python version to 3.9 to match NumPy 2.0. ↓

fix Ensure your entire scientific Python environment has compatible versions of all libraries when moving to NumPy 2.0. Check dependency release notes for NumPy 2.0 compatibility. [35]

Install compatibility verified last tested: 2026-05-12

python os / libc status wheel install import disk

3.10 alpine (musl) wheel - 2.99s 357.3M

3.10 alpine (musl) - - 2.91s 357.1M

3.10 slim (glibc) wheel 12.9s 2.37s 345M

3.10 slim (glibc) - - 2.09s 344M

3.11 alpine (musl) wheel - 4.89s 384.5M

3.11 alpine (musl) - - 5.25s 384.1M

3.11 slim (glibc) wheel 12.4s 4.62s 370M

3.11 slim (glibc) - - 4.15s 369M

3.12 alpine (musl) wheel - 4.53s 366.2M

3.12 alpine (musl) - - 4.72s 365.9M

3.12 slim (glibc) wheel 13.7s 4.82s 352M

3.12 slim (glibc) - - 5.09s 351M

3.13 alpine (musl) wheel - 4.13s 364.1M

3.13 alpine (musl) - - 4.27s 363.7M

3.13 slim (glibc) wheel 13.7s 4.13s 349M

3.13 slim (glibc) - - 4.41s 349M

3.9 alpine (musl) build_error - 0.1s - -

3.9 alpine (musl) - - - -

3.9 slim (glibc) wheel 15.3s 2.58s 352M

3.9 slim (glibc) - - 2.41s 352M

Imports

statsmodels.api
```
import statsmodels.api as sm
```
Main entry point for most common models (e.g., OLS, GLM) when using NumPy arrays or pre-processed Pandas DataFrames. It's stable and recommended for direct model fitting. [18, 20]
statsmodels.formula.api
```
import statsmodels.formula.api as smf
```
Provides an R-style formula interface, highly recommended for exploratory data analysis and when working directly with Pandas DataFrames and categorical variables. [3, 18, 20]
Specific Submodule
wrong
```
from statsmodels.tsa.arima_model import ARIMA
```
correct
```
from statsmodels.tsa.arima.model import ARIMA
```
Direct imports from submodules are used for specialized functionality (e.g., time series). Be aware that older submodule paths (like `statsmodels.tsa.arima_model`) might be deprecated or removed in newer versions, use the `statsmodels.tsa.arima.model` path instead. [20, 34]

Quickstart last tested: 2026-04-24

This example demonstrates how to fit a simple Ordinary Least Squares (OLS) regression model using the R-style formula interface provided by `statsmodels.formula.api`. It shows creating sample data, defining the model with a formula, fitting it, and then printing a comprehensive summary of the results, including coefficients, R-squared, and various statistical tests. [3, 6, 33]

import statsmodels.formula.api as smf
import pandas as pd
import numpy as np

# 1. Create a sample DataFrame
np.random.seed(42)
data = {
    'y': 10 + 2 * np.random.rand(100) + 3 * np.random.randn(100),
    'x1': np.random.rand(100) * 10,
    'x2': np.random.randint(0, 2, 100) # categorical variable example
}
df = pd.DataFrame(data)

# 2. Fit OLS (Ordinary Least Squares) model using R-style formula
#    'y ~ x1 + C(x2)' means y is dependent on x1 and categorical x2
model = smf.ols('y ~ x1 + C(x2)', data=df)
results = model.fit()

# 3. Print the summary of the regression results
print(results.summary())