{"id":10351,"library":"whylogs-sketching","title":"whylogs-sketching Library","description":"whylogs-sketching is a foundational Python library that provides efficient, probabilistic data structures for common data sketching tasks, such as approximate cardinality estimation (HyperLogLog), frequent item counting, and histogram generation. It serves as a core dependency for the `whylogs` data logging and profiling library, providing the underlying sketching primitives. While it can be used independently for low-level sketching operations, most users interact with its functionalities through the higher-level APIs of `whylogs`. The current PyPI version is 3.4.1.dev3, and its development generally aligns with the `whylogs` release cycle.","status":"active","version":"3.4.1.dev3","language":"en","source_language":"en","source_url":"https://github.com/whylabs/whylogs-sketching","tags":["data sketching","probabilistic data structures","hyperloglog","frequent items","histogram","whylogs"],"install":[{"cmd":"pip install whylogs-sketching","lang":"bash","label":"Direct installation"},{"cmd":"pip install whylogs","lang":"bash","label":"Installed as a dependency of whylogs"}],"dependencies":[{"reason":"Numerical operations for sketches","package":"numpy"},{"reason":"Underlying data sketching implementations (e.g., for Frequent Strings Sketch)","package":"datasketches"},{"reason":"Type hinting support for Python < 3.11","package":"typing-extensions","optional":true}],"imports":[{"note":"For HyperLogLog cardinality estimation","symbol":"HllSketch","correct":"from whylogs_sketching.hll_sketch import HllSketch"},{"note":"For approximate frequent item counting","symbol":"FrequentItemsSketch","correct":"from whylogs_sketching.frequent_items_sketch import FrequentItemsSketch"},{"note":"For approximate histogram generation","symbol":"Histogram","correct":"from whylogs_sketching.histogram import Histogram"},{"note":"Specific frequent strings sketch implementation from datasketches subpackage","symbol":"FrequentStringsSketch","correct":"from whylogs_sketching.datasketches.frequent_strings_sketch import FrequentStringsSketch"}],"quickstart":{"code":"from whylogs_sketching.hll_sketch import HllSketch\n\n# Create an HLL sketch for approximate unique count\nsketch = HllSketch()\n\n# Add various values to the sketch\nsketch.update(\"user_a\")\nsketch.update(\"user_b\")\nsketch.update(\"user_a\") # Adding a duplicate\nsketch.update(\"user_c\")\nsketch.update(123)\nsketch.update(456.78)\n\n# Get the approximate number of unique items\ncardinality_estimate = sketch.get_estimate()\nprint(f\"Approximate unique items: {cardinality_estimate}\")\n\n# You can also merge two sketches\nsketch2 = HllSketch()\nsketch2.update(\"user_c\") # Another duplicate\nsketch2.update(\"user_d\")\n\nsketch.merge(sketch2)\nmerged_cardinality = sketch.get_estimate()\nprint(f\"Approximate unique items after merge: {merged_cardinality}\")\n","lang":"python","description":"This quickstart demonstrates how to initialize an `HllSketch`, add various types of elements to it, and retrieve the approximate count of unique items. It also shows how to merge two sketches to combine their unique counts."},"warnings":[{"fix":"Consider if your use case truly requires direct interaction with sketching primitives or if `whylogs` can provide the necessary functionality more easily.","message":"whylogs-sketching is primarily an internal dependency of the `whylogs` library. While direct usage is possible for low-level operations, most users will interact with its functionalities through `whylogs`'s higher-level APIs, which might offer a more convenient and integrated experience.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Monitor the `whylogs` project for stable version announcements or refer to the `whylogs` `pyproject.toml` for the exact `whylogs-sketching` version it depends on.","message":"The latest PyPI version (e.g., 3.4.1.dev3) often reflects a development release. While functional, it might indicate that the library is still under active iteration or that stable releases are primarily tied to `whylogs`'s major versions.","severity":"gotcha","affected_versions":"3.x.x.devY releases"},{"fix":"Always consult the official `whylogs-sketching` (or `whylogs`) GitHub repository for specific API changes between major versions when upgrading. Re-test code against new versions.","message":"Major version changes (e.g., from 0.x to 3.x) have introduced significant API changes, particularly in class constructors, method signatures, and internal data representations.","severity":"breaking","affected_versions":"< 3.0.0 to >= 3.0.0"}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"Run `pip install whylogs-sketching` or `pip install whylogs` to install it.","cause":"The `whylogs-sketching` library has not been installed in the current Python environment.","error":"ModuleNotFoundError: No module named 'whylogs_sketching'"},{"fix":"Correct the import and class instantiation to `from whylogs_sketching.hll_sketch import HllSketch` and `sketch = HllSketch()`.","cause":"Incorrect class name used. The primary class for HyperLogLog is `HllSketch` (lowercase 'll').","error":"AttributeError: module 'whylogs_sketching.hll_sketch' has no attribute 'HyperLogLogSketch'"},{"fix":"Ensure you pass the value to be added, e.g., `sketch.update('my_item')`.","cause":"Attempting to call an update method without providing the item to be sketched.","error":"TypeError: update() missing 1 required positional argument: 'value'"}]}