Madoka: Memory-efficient CountMin Sketch
Madoka is a Python library that provides a memory-efficient implementation of the CountMin Sketch probabilistic data structure, based on the Madoka C++ library. It's designed for estimating frequencies of items in a data stream with limited memory. The current version is 0.7.2.1, and releases occur infrequently, primarily for maintenance or updates to the underlying C++ library.
Common errors
-
TypeError: a bytes-like object is required, not 'str'
cause Attempted to add or query a key using a Python string instead of a byte string.fixConvert the string key to bytes before passing it to Madoka methods. Example: `sketch.add(my_string.encode('utf-8'))` or `sketch.query(b'my_literal_key')`. -
ModuleNotFoundError: No module named 'madoka'
cause The Madoka library is not installed in your current Python environment.fixInstall the library using pip: `pip install madoka`. If in a virtual environment, ensure it's activated. -
error: command 'gcc' failed with exit status 1 (or similar compilation error with MSVC/Clang)
cause During installation, pip attempted to compile Madoka from source, but a C++ compiler was not found or configured correctly on your system.fixInstall the necessary C++ build tools for your operating system. For Linux (Debian/Ubuntu), run `sudo apt-get install build-essential`. For macOS, install Xcode Command Line Tools (`xcode-select --install`). For Windows, install Microsoft Visual C++ Build Tools.
Warnings
- gotcha Madoka requires keys to be `bytes` objects, not strings. Passing a string will result in a `TypeError`.
- gotcha Installation may fail if a C++ compiler is not available on your system, especially if pre-built wheels are not provided for your specific OS and Python version.
- gotcha The `width` parameter (or implied by `epsilon`) directly impacts memory usage. A very large `width` can lead to significant memory consumption, potentially exceeding available RAM.
Install
-
pip install madoka
Imports
- CountMinSketch
import madoka.CountMinSketch
from madoka import CountMinSketch
Quickstart
import madoka
# Create a CountMin Sketch with desired depth and width (or epsilon/delta)
sketch = madoka.CountMinSketch(depth=4, width=2**20)
# Add items (must be bytes)
sketch.add(b"user_id_1")
sketch.add(b"product_page_A")
sketch.add(b"user_id_1")
# Query approximate counts
count_user1 = sketch.query(b"user_id_1")
count_pageA = sketch.query(b"product_page_A")
count_unknown = sketch.query(b"non_existent_item")
print(f"Approximate count for 'user_id_1': {count_user1}")
print(f"Approximate count for 'product_page_A': {count_pageA}")
print(f"Approximate count for 'non_existent_item': {count_unknown}")
# Reset the sketch
sketch.reset()