DDSketch (Distributed Quantile Sketches)
DDSketch is a Python library for distributed quantile sketches, an algorithm for estimating quantiles with guaranteed relative accuracy. It allows for merging sketches from different sources while maintaining accuracy. The current version is 3.0.1. Releases are made as needed for bug fixes, Python version compatibility, or feature enhancements.
Warnings
- breaking Python 2.x and Python 3.6 support has been dropped. The library now requires Python >= 3.7.
- breaking The `numpy` dependency was removed. If your project explicitly relied on `ddsketch` pulling in `numpy`, you will need to add it as a direct dependency to your project.
- breaking Many `DDSketch` attributes such as `mapping`, `store`, `negative_store`, `zero_count`, `relative_accuracy`, `min`, and `max` were removed from direct access.
- deprecated Protobuf serialization is now an optional dependency. If you use `DDSketch.to_proto()` or `DDSketch.from_proto()`, you must install the `serialization` extra.
- gotcha The import path for `DDSketch` changed from `ddsketch.ddsketch.DDSketch` to `ddsketch.DDSketch` to simplify top-level access.
Install
-
pip install ddsketch -
pip install ddsketch[serialization]
Imports
- DDSketch
from ddsketch import DDSketch
Quickstart
from ddsketch import DDSketch
# Initialize a DDSketch with a desired relative accuracy
s = DDSketch(relative_accuracy=0.01) # 1% relative accuracy
# Add data points
s.add(1.0)
s.add(2.0)
s.add(3.0)
s.add(10.0, count=5) # Add 10.0 five times
# Get quantiles
print(f"Value at 0.5 quantile: {s.get_value_at_quantile(0.5)}")
print(f"Value at 0.9 quantile: {s.get_value_at_quantile(0.9)}")
# Get min/max values
print(f"Min value: {s.min_value}")
print(f"Max value: {s.max_value}")
# Merge another sketch
s2 = DDSketch(relative_accuracy=0.01)
s2.add(20.0)
s.merge(s2)
print(f"Value at 0.95 quantile after merge: {s.get_value_at_quantile(0.95)}")