whylogs-sketching Library
whylogs-sketching is a foundational Python library that provides efficient, probabilistic data structures for common data sketching tasks, such as approximate cardinality estimation (HyperLogLog), frequent item counting, and histogram generation. It serves as a core dependency for the `whylogs` data logging and profiling library, providing the underlying sketching primitives. While it can be used independently for low-level sketching operations, most users interact with its functionalities through the higher-level APIs of `whylogs`. The current PyPI version is 3.4.1.dev3, and its development generally aligns with the `whylogs` release cycle.
Common errors
-
ModuleNotFoundError: No module named 'whylogs_sketching'
cause The `whylogs-sketching` library has not been installed in the current Python environment.fixRun `pip install whylogs-sketching` or `pip install whylogs` to install it. -
AttributeError: module 'whylogs_sketching.hll_sketch' has no attribute 'HyperLogLogSketch'
cause Incorrect class name used. The primary class for HyperLogLog is `HllSketch` (lowercase 'll').fixCorrect the import and class instantiation to `from whylogs_sketching.hll_sketch import HllSketch` and `sketch = HllSketch()`. -
TypeError: update() missing 1 required positional argument: 'value'
cause Attempting to call an update method without providing the item to be sketched.fixEnsure you pass the value to be added, e.g., `sketch.update('my_item')`.
Warnings
- gotcha whylogs-sketching is primarily an internal dependency of the `whylogs` library. While direct usage is possible for low-level operations, most users will interact with its functionalities through `whylogs`'s higher-level APIs, which might offer a more convenient and integrated experience.
- gotcha The latest PyPI version (e.g., 3.4.1.dev3) often reflects a development release. While functional, it might indicate that the library is still under active iteration or that stable releases are primarily tied to `whylogs`'s major versions.
- breaking Major version changes (e.g., from 0.x to 3.x) have introduced significant API changes, particularly in class constructors, method signatures, and internal data representations.
Install
-
pip install whylogs-sketching -
pip install whylogs
Imports
- HllSketch
from whylogs_sketching.hll_sketch import HllSketch
- FrequentItemsSketch
from whylogs_sketching.frequent_items_sketch import FrequentItemsSketch
- Histogram
from whylogs_sketching.histogram import Histogram
- FrequentStringsSketch
from whylogs_sketching.datasketches.frequent_strings_sketch import FrequentStringsSketch
Quickstart
from whylogs_sketching.hll_sketch import HllSketch
# Create an HLL sketch for approximate unique count
sketch = HllSketch()
# Add various values to the sketch
sketch.update("user_a")
sketch.update("user_b")
sketch.update("user_a") # Adding a duplicate
sketch.update("user_c")
sketch.update(123)
sketch.update(456.78)
# Get the approximate number of unique items
cardinality_estimate = sketch.get_estimate()
print(f"Approximate unique items: {cardinality_estimate}")
# You can also merge two sketches
sketch2 = HllSketch()
sketch2.update("user_c") # Another duplicate
sketch2.update("user_d")
sketch.merge(sketch2)
merged_cardinality = sketch.get_estimate()
print(f"Approximate unique items after merge: {merged_cardinality}")