distogram

raw JSON →
3.0.3 verified Fri May 01 auth: no python

A library to compute histograms on distributed environments, on streaming data. Current version: 3.0.3. Released occasionally, with breaking changes from v2 to v3 (API redesign).

pip install distogram
error AttributeError: module 'distogram' has no attribute 'distogram'
cause Trying to use the old v2 API (distogram.distogram) on v3.
fix
Use from distogram import Distogram and then Distogram().
error TypeError: distogram() takes 1 positional argument but 2 were given
cause Calling Distogram(bin_count=50) which is v2 syntax. v3 uses max_bins.
fix
Use Distogram(max_bins=50).
error ImportError: cannot import name 'Distogram' from 'distogram'
cause Installed distogram v2.x which does not have the class. Or typo.
fix
Upgrade to distogram>=3.0.0 with pip install --upgrade distogram.
breaking distogram v3 changes the API from a function-based interface to a class-based one. The old distogram() function is replaced by the Distogram class. Functions like insert, histogram, merge are now module-level functions taking a Distogram object as first argument.
fix Replace distogram() with Distogram(), and use module-level functions: from distogram import Distogram, insert, histogram.
breaking distogram v3 removes the bin_count parameter from Distogram constructor. Use max_bins instead.
fix Change Distogram(bin_count=50) to Distogram(max_bins=50).
deprecated The old distogram module function (lowercase) is removed in v3. Trying to import it will raise ImportError.
fix Use from distogram import Distogram (capital D).
gotcha distogram does not preserve precise quantiles; it's an approximation. Do not rely on exact percentile values.
fix Use for approximate histograms only; for exact quantiles use other tools.

Create a Distogram, insert data points, then retrieve histogram bins.

from distogram import Distogram, insert, histogram

dist = Distogram()
for v in [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]:
    insert(dist, v)
bins = histogram(dist, bins=3)
print(bins)