dask-glm

raw JSON →
0.4.0 verified Fri May 01 auth: no python

Generalized Linear Models with Dask. Current version is 0.4.0, with a release cadence approximately yearly. It provides implementations of GLMs (logistic, Poisson, etc.) that work on Dask arrays and DataFrames.

pip install dask-glm
error ModuleNotFoundError: No module named 'dask_glm'
cause Library not installed or installed incorrectly (e.g., pip install dask-glm but tried import dask_glm).
fix
Run pip install dask-glm and then import as from dask_glm.logistic import LogisticRegression.
error AttributeError: module 'dask_glm' has no attribute 'LogisticRegression'
cause Trying to import LogisticRegression from the top-level package, but it is in dask_glm.logistic.
fix
Use from dask_glm.logistic import LogisticRegression.
error ValueError: The computed value of 'chunks' is larger than the number of rows. (Try increasing chunk size?)
cause Dask array chunks are too small, causing issues with solver internals.
fix
Increase chunk size: e.g., X = da.from_array(data, chunks=500) or use X.rechunk(chunks=500).
breaking dask-glm 0.4.0 drops support for Python 3.8 and 3.9; requires Python >=3.10.
fix Upgrade Python to 3.10 or later.
deprecated The old API using functions like `dask_glm.logistic.fit` is deprecated in favor of the object-oriented API with `LogisticRegression().fit()`.
fix Use the class-based API: `model = LogisticRegression(); model.fit(X, y).coef_`.
gotcha When using GPU acceleration with CuPy, ensure CuPy is installed and arrays are cupy-backed Dask arrays. The library does not automatically move data to GPU.
fix Install cupy and convert Dask arrays to cupy arrays via `.map_blocks(cp.asarray)` before fitting.
conda install -c conda-forge dask-glm

Quick example: fit a logistic regression model on a Dask array.

import dask.array as da
from dask_glm.logistic import LogisticRegression

# Create synthetic data
np.random.seed(0)
X = da.from_array(np.random.randn(1000, 10), chunks=100)
y = da.from_array((np.random.rand(1000) > 0.5).astype(int), chunks=100)

# Fit logistic regression
model = LogisticRegression()
coef = model.fit(X, y).coef_
print(coef.compute())