fastdigest
Fastdigest is a Python library that provides a lightning-fast implementation of the t-digest data structure, built on Rust. It offers a lightweight suite of online statistics for streaming and distributed data, enabling accurate estimation of quantiles, CDF, trimmed mean, and more. The library is currently at version 0.12.0 and maintains an active release cadence.
Common errors
-
AttributeError: 'TDigest' object has no attribute 'mass'
cause In versions 0.12.0 and later, `mass` was changed from a property to a method.fixCall `mass` as a method: `digest.mass()` -
AttributeError: 'TDigest' object has no attribute 'is_empty'
cause In versions 0.12.0 and later, `is_empty` was changed from a property to a method.fixCall `is_empty` as a method: `digest.is_empty()` -
ValueError: max_centroids must be a non-negative integer
cause Attempting to initialize `TDigest` or set `max_centroids` with a negative value or non-integer type.fixEnsure that `max_centroids` is always a non-negative integer value, e.g., `TDigest(max_centroids=50)`. -
TypeError: object of type 'int' has no len() (or similar when merging non-TDigest objects)
cause The `merge_all` method received an iterable containing objects that are not `TDigest` instances.fixVerify that all elements in the list or iterable passed to `TDigest.merge_all()` are indeed `TDigest` objects. -
MemoryError: failed to allocate TDigest (possibly during merge operation)
cause The `TDigest` attempted to allocate memory for a large number of centroids, exceeding available system memory. This can happen with very large merges or when `max_centroids` is set too high for the data volume.fixConsider reducing the `max_centroids` parameter to limit memory usage, process data in smaller batches, or ensure sufficient memory is available for large operations.
Warnings
- breaking The `mass` and `is_empty` attributes were converted from properties to methods. Attempting to access them as properties will now raise an `AttributeError`.
- breaking The `max_centroids` setter and constructor arguments changed their internal type handling and validation. Attempting to set negative `max_centroids` will now raise a `ValueError` (previously could cause an `OverflowError` or unexpected behavior).
- gotcha The `std()` method's estimation was significantly improved in v0.12.0. It now estimates population variance via centroid second moments, making it more accurate and faster than the previous MAD-based estimation (from v0.11.0), which was only strictly valid for approximately normal distributions. Results for non-normal distributions will differ.
- gotcha Calling `merge_all` with an iterable containing non-`TDigest` objects will now explicitly raise a `TypeError` instead of potentially panicking or leading to unexpected behavior.
Install
-
pip install fastdigest
Imports
- TDigest
from fastdigest import TDigest
Quickstart
from fastdigest import TDigest
import random
# Create a new TDigest instance
digest = TDigest()
# Add values incrementally
for _ in range(10000):
digest.update(random.random() * 100)
# Or create from a sequence of values (with optional weights)
data = [1.42, 2.71, 3.14, 5.0, 8.0, 13.0]
digest_from_values = TDigest.from_values(data)
# Add a batch of values
digest_from_values.batch_update([1.0, 2.0, 3.0, 4.0])
# Get quantiles
median = digest.quantile(0.5)
percentile_90 = digest.quantile(0.9)
print(f"Median: {median:.2f}")
print(f"90th Percentile: {percentile_90:.2f}")
# Get the number of centroids
print(f"Number of centroids: {len(digest)}")
# Check if empty (as method)
print(f"Is digest empty? {digest.is_empty()}")