dask-glm
raw JSON → 0.4.0 verified Fri May 01 auth: no python
Generalized Linear Models with Dask. Current version is 0.4.0, with a release cadence approximately yearly. It provides implementations of GLMs (logistic, Poisson, etc.) that work on Dask arrays and DataFrames.
pip install dask-glm Common errors
error ModuleNotFoundError: No module named 'dask_glm' ↓
cause Library not installed or installed incorrectly (e.g., pip install dask-glm but tried import dask_glm).
fix
Run
pip install dask-glm and then import as from dask_glm.logistic import LogisticRegression. error AttributeError: module 'dask_glm' has no attribute 'LogisticRegression' ↓
cause Trying to import LogisticRegression from the top-level package, but it is in dask_glm.logistic.
fix
Use
from dask_glm.logistic import LogisticRegression. error ValueError: The computed value of 'chunks' is larger than the number of rows. (Try increasing chunk size?) ↓
cause Dask array chunks are too small, causing issues with solver internals.
fix
Increase chunk size: e.g.,
X = da.from_array(data, chunks=500) or use X.rechunk(chunks=500). Warnings
breaking dask-glm 0.4.0 drops support for Python 3.8 and 3.9; requires Python >=3.10. ↓
fix Upgrade Python to 3.10 or later.
deprecated The old API using functions like `dask_glm.logistic.fit` is deprecated in favor of the object-oriented API with `LogisticRegression().fit()`. ↓
fix Use the class-based API: `model = LogisticRegression(); model.fit(X, y).coef_`.
gotcha When using GPU acceleration with CuPy, ensure CuPy is installed and arrays are cupy-backed Dask arrays. The library does not automatically move data to GPU. ↓
fix Install cupy and convert Dask arrays to cupy arrays via `.map_blocks(cp.asarray)` before fitting.
Install
conda install -c conda-forge dask-glm Imports
- LogisticRegression wrong
from dask_glm import LogisticRegressioncorrectfrom dask_glm.logistic import LogisticRegression - PoissonRegression
from dask_glm.poisson import PoissonRegression - proximal_grad
from dask_glm.regularizers import proximal_grad
Quickstart
import dask.array as da
from dask_glm.logistic import LogisticRegression
# Create synthetic data
np.random.seed(0)
X = da.from_array(np.random.randn(1000, 10), chunks=100)
y = da.from_array((np.random.rand(1000) > 0.5).astype(int), chunks=100)
# Fit logistic regression
model = LogisticRegression()
coef = model.fit(X, y).coef_
print(coef.compute())