whylogs-sketching Library

3.4.1.dev3 · active · verified Fri Apr 17

whylogs-sketching is a foundational Python library that provides efficient, probabilistic data structures for common data sketching tasks, such as approximate cardinality estimation (HyperLogLog), frequent item counting, and histogram generation. It serves as a core dependency for the `whylogs` data logging and profiling library, providing the underlying sketching primitives. While it can be used independently for low-level sketching operations, most users interact with its functionalities through the higher-level APIs of `whylogs`. The current PyPI version is 3.4.1.dev3, and its development generally aligns with the `whylogs` release cycle.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize an `HllSketch`, add various types of elements to it, and retrieve the approximate count of unique items. It also shows how to merge two sketches to combine their unique counts.

from whylogs_sketching.hll_sketch import HllSketch

# Create an HLL sketch for approximate unique count
sketch = HllSketch()

# Add various values to the sketch
sketch.update("user_a")
sketch.update("user_b")
sketch.update("user_a") # Adding a duplicate
sketch.update("user_c")
sketch.update(123)
sketch.update(456.78)

# Get the approximate number of unique items
cardinality_estimate = sketch.get_estimate()
print(f"Approximate unique items: {cardinality_estimate}")

# You can also merge two sketches
sketch2 = HllSketch()
sketch2.update("user_c") # Another duplicate
sketch2.update("user_d")

sketch.merge(sketch2)
merged_cardinality = sketch.get_estimate()
print(f"Approximate unique items after merge: {merged_cardinality}")

view raw JSON →