GLUM - High Performance Generalized Linear Models
glum is a Python library providing high-performance implementations of Generalized Linear Models (GLMs), including various distributions and link functions. It focuses on speed and feature richness, supporting regularized fitting (L1, L2, ElasticNet) and cross-validation. The current version is 3.3.0, and it maintains an active release cadence with multiple updates throughout the year.
Common errors
-
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').cause Using `InverseGaussianDistribution` with `glum` versions prior to 3.2.3 would cause `log_likelihood` to return NaN, propagating through the model fitting process.fixUpgrade your `glum` installation to version 3.2.3 or newer: `pip install --upgrade glum`. -
TypeError: 'numpy.float64' object cannot be interpreted as an integer
cause In `glum` versions before 3.1.3, the `theta` setter for `NegativeBinomialDistribution` incorrectly rejected `numpy.number` types, leading to errors when attempting to assign a `theta` value derived from NumPy operations.fixUpgrade `glum` to version 3.1.3 or later: `pip install --upgrade glum`. -
Predictions are inconsistent or incorrect when using alpha_search with categorical features.
cause A bug in `glum` versions prior to 3.2.1 caused incorrect predictions when specifying a particular `alpha` value after fitting with `alpha_search=True` and having categorical features in the dataset.fixEnsure you are using `glum` version 3.2.1 or newer. Upgrade using `pip install --upgrade glum`. -
The `deviance_path_` attribute in `GeneralizedLinearRegressorCV` shows unexpectedly low values.
cause Before version 3.1.3, `deviance_path_` was incorrectly scaled down by a factor of `n_folds`, leading to underestimated deviance values.fixUpgrade your `glum` library to version 3.1.3 or newer to get correctly scaled `deviance_path_`: `pip install --upgrade glum`.
Warnings
- breaking In v3.3.0, the `trust-constr` solver's default Hessian calculation changed from `hess="2-point"` (finite-difference) to `SR1()` (quasi-Newton). While improving performance, this might lead to slightly different numerical results for models that previously relied on the finite-difference Hessian.
- gotcha Prior to v3.2.3, `InverseGaussianDistribution.log_likelihood` contained an incorrect call, causing it to always return NaN. Models using `InverseGaussianDistribution` would fail or produce invalid results.
- gotcha Versions before 3.2.1 had an error when predicting at a specific `alpha` with categorical features, potentially leading to incorrect or failed predictions.
- gotcha In `GeneralizedLinearRegressorCV`, the `deviance_path_` attribute was incorrectly scaled by `n_folds` in versions prior to 3.1.3, leading to misinterpretation of cross-validation deviance values.
Install
-
pip install glum
Imports
- GeneralizedLinearRegressor
from glum import GeneralizedLinearRegressor
- GeneralizedLinearRegressorCV
from glum import GeneralizedLinearRegressorCV
- PoissonDistribution
from glum import PoissonDistribution
- GammaDistribution
from glum import GammaDistribution
- LogLink
from glum import LogLink
Quickstart
import numpy as np
import pandas as pd
from glum import GeneralizedLinearRegressor, PoissonDistribution, LogLink
# Generate some synthetic data
np.random.seed(42)
n_samples = 100
X = pd.DataFrame({
'feature_1': np.random.rand(n_samples) * 10,
'feature_2': np.random.rand(n_samples) * 5
})
# True coefficients
beta_0 = 1.0
beta_1 = 0.5
beta_2 = 0.2
# Generate Poisson-distributed target variable using a log link
linear_predictor = beta_0 + beta_1 * X['feature_1'] + beta_2 * X['feature_2']
mu = np.exp(linear_predictor)
y = np.random.poisson(mu)
# Create and fit the GLM
glm = GeneralizedLinearRegressor(
distribution=PoissonDistribution(),
link=LogLink(),
fit_intercept=True
)
glm.fit(X, y)
print(f"Intercept: {glm.intercept_:.4f}")
print(f"Coefficients: {glm.coef_}")
print(f"Deviance: {glm.deviance_:.4f}")
print("Predictions (first 5 samples):\n", glm.predict(X.head()).round(2))