pyprobables
pyprobables is a pure-Python library offering implementations of common probabilistic data structures like Bloom filters, Count-Min sketches, Cuckoo filters, and Quotient filters. It provides memory-efficient ways to perform operations such as set membership testing and approximate frequency counting. The library is actively maintained, with its current version being 0.7.0, and receives regular updates including new features, bug fixes, and Python version support changes.
Warnings
- breaking As of v0.7.0, comparing mismatched Bloom filters (e.g., different sizes or hash functions) will now raise a `SimilarityError` instead of returning `None` for comparison operations.
- breaking Python 3.9 support was dropped in v0.7.0. Python 3.8 support was dropped in v0.6.2, and 3.6/3.7 support in v0.5.9. The library now requires Python >=3.10.
- gotcha For better raw performance, especially with high data volumes, consider supplying an alternative hashing algorithm compiled in C, such as those from `mmh3` or `pyhash`.
- gotcha Bloom filters and other probabilistic data structures have a predefined or desired false positive rate based on the estimated number of elements (`est_elements`) during initialization. If the actual number of elements added exceeds this estimate, the false positive rate will increase beyond the desired amount.
Install
-
pip install pyprobables
Imports
- BloomFilter
from probables import BloomFilter
- CountMinSketch
from probables import CountMinSketch
- CuckooFilter
from probables import CuckooFilter
- QuotientFilter
from probables import QuotientFilter
Quickstart
from probables import BloomFilter
# Initialize a Bloom filter for 100,000 elements with a 0.05 (5%) false positive rate
blm = BloomFilter(est_elements=100000, false_positive_rate=0.05)
# Add elements
blm.add('apple')
blm.add('banana')
blm.add('orange')
# Check for membership
print(f"Is 'apple' in the filter? {blm.check('apple')}")
print(f"Is 'grape' in the filter? {blm.check('grape')}")
# Demonstrate false positive possibility (very low with chosen parameters for this small example)
# In a real scenario, with many elements, a non-member might occasionally return True.
if blm.check('nonexistent_fruit'):
print("Warning: A false positive occurred for 'nonexistent_fruit'.")