Madoka: Memory-efficient CountMin Sketch

0.7.2.1 · active · verified Thu Apr 16

Madoka is a Python library that provides a memory-efficient implementation of the CountMin Sketch probabilistic data structure, based on the Madoka C++ library. It's designed for estimating frequencies of items in a data stream with limited memory. The current version is 0.7.2.1, and releases occur infrequently, primarily for maintenance or updates to the underlying C++ library.

Common errors

Warnings

Install

Imports

Quickstart

Initializes a CountMin Sketch, adds byte-encoded items, queries their approximate frequencies, and demonstrates resetting the sketch.

import madoka

# Create a CountMin Sketch with desired depth and width (or epsilon/delta)
sketch = madoka.CountMinSketch(depth=4, width=2**20)

# Add items (must be bytes)
sketch.add(b"user_id_1")
sketch.add(b"product_page_A")
sketch.add(b"user_id_1")

# Query approximate counts
count_user1 = sketch.query(b"user_id_1")
count_pageA = sketch.query(b"product_page_A")
count_unknown = sketch.query(b"non_existent_item")

print(f"Approximate count for 'user_id_1': {count_user1}")
print(f"Approximate count for 'product_page_A': {count_pageA}")
print(f"Approximate count for 'non_existent_item': {count_unknown}")

# Reset the sketch
sketch.reset()

view raw JSON →