statsmodels

0.14.6 · active · verified Sun Mar 29

statsmodels is a Python package offering a wide array of statistical models, hypothesis tests, and statistical data exploration tools. It provides classes and functions for the estimation of many different statistical models, including linear regression, generalized linear models, discrete choice models, and time series analysis. Currently at version 0.14.6, the library follows a loose, long time-based release cycle for its dependencies, typically updating minimal versions every one and a half to two years. [2, 3, 5, 7]

Warnings

Install

Imports

Quickstart

This example demonstrates how to fit a simple Ordinary Least Squares (OLS) regression model using the R-style formula interface provided by `statsmodels.formula.api`. It shows creating sample data, defining the model with a formula, fitting it, and then printing a comprehensive summary of the results, including coefficients, R-squared, and various statistical tests. [3, 6, 33]

import statsmodels.formula.api as smf
import pandas as pd
import numpy as np

# 1. Create a sample DataFrame
np.random.seed(42)
data = {
    'y': 10 + 2 * np.random.rand(100) + 3 * np.random.randn(100),
    'x1': np.random.rand(100) * 10,
    'x2': np.random.randint(0, 2, 100) # categorical variable example
}
df = pd.DataFrame(data)

# 2. Fit OLS (Ordinary Least Squares) model using R-style formula
#    'y ~ x1 + C(x2)' means y is dependent on x1 and categorical x2
model = smf.ols('y ~ x1 + C(x2)', data=df)
results = model.fit()

# 3. Print the summary of the regression results
print(results.summary())

view raw JSON →