statsmodels
raw JSON → 0.14.6 verified Tue May 12 auth: no python install: verified
statsmodels is a Python package offering a wide array of statistical models, hypothesis tests, and statistical data exploration tools. It provides classes and functions for the estimation of many different statistical models, including linear regression, generalized linear models, discrete choice models, and time series analysis. Currently at version 0.14.6, the library follows a loose, long time-based release cycle for its dependencies, typically updating minimal versions every one and a half to two years. [2, 3, 5, 7]
pip install statsmodels Common errors
error ModuleNotFoundError: No module named 'statsmodels' ↓
cause The 'statsmodels' package is not installed in the current Python environment.
fix
pip install statsmodels
error ValueError: endog and exog are of different lengths ↓
cause The dependent variable (endog) and independent variables (exog) arrays or series have a different number of observations.
fix
Ensure that the
endog and exog arrays/Series passed to the model constructor have the exact same number of rows/observations, often by aligning their indices or handling missing values consistently. error ValueError: Perfect multicollinearity detected. ↓
cause Occurs when one independent variable can be perfectly predicted from a linear combination of other independent variables, leading to an ill-conditioned design matrix.
fix
Identify and remove redundant independent variables from your model (e.g., duplicate columns, a constant column when an intercept is automatically added, or dummy variable trap).
error ConvergenceWarning: Maximum number of iterations has been reached. ↓
cause The iterative optimization algorithm used by the model (e.g., GLM, discrete choice models) failed to converge to a solution within the specified maximum number of iterations.
fix
Increase the maximum number of iterations (e.g.,
model.fit(maxiter=1000)), check for perfect multicollinearity, or try different optimization methods if available for the specific model. Warnings
breaking The `scikits` namespace was deprecated and eventually removed in versions prior to 0.5.0. Direct imports from `scikits.statsmodels` are no longer valid. ↓
fix Always use `import statsmodels.api as sm` or direct imports from `statsmodels.<submodule>` (e.g., `statsmodels.regression.linear_model`). [29, 31]
breaking The signature of `model.predict` methods changed in versions prior to 0.5.0. It now explicitly requires the `params` argument (e.g., `model.predict(params, exog)`), rather than assuming the model has already been fit and omitting `params`. ↓
fix Ensure `model.predict` calls explicitly pass the `params` argument from the fitted model, e.g., `results.predict(exog)` or `model.predict(results.params, exog)`. [29, 31]
deprecated The `statsmodels.tsa.arima_model.ARMA` and `statsmodels.tsa.arima_model.ARIMA` classes have been deprecated. Using them will raise a `FutureWarning`. ↓
fix Migrate to `statsmodels.tsa.arima.model.ARIMA`. The new API provides more consistent handling and features. [34]
gotcha When using the direct `statsmodels.api.OLS(y, X)` interface (without formulas), an intercept term (constant) is NOT automatically added to the `X` (exog) design matrix. This differs from some other statistical software and can lead to incorrect models if an intercept is expected. ↓
fix Explicitly add a constant term using `X = sm.add_constant(X)` from `statsmodels.api` before fitting the model, or use the `statsmodels.formula.api` interface which handles intercepts automatically. [19, 33]
breaking Pandas' `Panel` object and `pandas.stats.ols` (among others) were deprecated and removed in Pandas 0.20.1 and later. Users relying on these for panel data or OLS directly from Pandas will need to switch. ↓
fix For OLS functionality, `statsmodels.api.OLS` is the recommended replacement. For panel data, Pandas recommends using a `MultiIndex` DataFrame or `xarray`, which can then be used with `statsmodels` models where appropriate (e.g., `MixedLM` for some panel-like structures). [37]
breaking Statsmodels 0.14.2 introduced compatibility with NumPy 2.0.0. While `statsmodels` itself may run on older NumPy versions, if you upgrade to NumPy 2.0, all other Python scientific stack dependencies (like SciPy and Pandas) *must also be NumPy 2.0 compatible* to avoid runtime issues. This release also increased the minimum Python version to 3.9 to match NumPy 2.0. ↓
fix Ensure your entire scientific Python environment has compatible versions of all libraries when moving to NumPy 2.0. Check dependency release notes for NumPy 2.0 compatibility. [35]
Install compatibility verified last tested: 2026-05-12
python os / libc status wheel install import disk
3.10 alpine (musl) wheel - 2.99s 357.3M
3.10 alpine (musl) - - 2.91s 357.1M
3.10 slim (glibc) wheel 12.9s 2.37s 345M
3.10 slim (glibc) - - 2.09s 344M
3.11 alpine (musl) wheel - 4.89s 384.5M
3.11 alpine (musl) - - 5.25s 384.1M
3.11 slim (glibc) wheel 12.4s 4.62s 370M
3.11 slim (glibc) - - 4.15s 369M
3.12 alpine (musl) wheel - 4.53s 366.2M
3.12 alpine (musl) - - 4.72s 365.9M
3.12 slim (glibc) wheel 13.7s 4.82s 352M
3.12 slim (glibc) - - 5.09s 351M
3.13 alpine (musl) wheel - 4.13s 364.1M
3.13 alpine (musl) - - 4.27s 363.7M
3.13 slim (glibc) wheel 13.7s 4.13s 349M
3.13 slim (glibc) - - 4.41s 349M
3.9 alpine (musl) build_error - 0.1s - -
3.9 alpine (musl) - - - -
3.9 slim (glibc) wheel 15.3s 2.58s 352M
3.9 slim (glibc) - - 2.41s 352M
Imports
- statsmodels.api
import statsmodels.api as sm - statsmodels.formula.api
import statsmodels.formula.api as smf - Specific Submodule wrong
from statsmodels.tsa.arima_model import ARIMAcorrectfrom statsmodels.tsa.arima.model import ARIMA
Quickstart last tested: 2026-04-24
import statsmodels.formula.api as smf
import pandas as pd
import numpy as np
# 1. Create a sample DataFrame
np.random.seed(42)
data = {
'y': 10 + 2 * np.random.rand(100) + 3 * np.random.randn(100),
'x1': np.random.rand(100) * 10,
'x2': np.random.randint(0, 2, 100) # categorical variable example
}
df = pd.DataFrame(data)
# 2. Fit OLS (Ordinary Least Squares) model using R-style formula
# 'y ~ x1 + C(x2)' means y is dependent on x1 and categorical x2
model = smf.ols('y ~ x1 + C(x2)', data=df)
results = model.fit()
# 3. Print the summary of the regression results
print(results.summary())