{"id":6808,"library":"pyprobables","title":"pyprobables","description":"pyprobables is a pure-Python library offering implementations of common probabilistic data structures like Bloom filters, Count-Min sketches, Cuckoo filters, and Quotient filters. It provides memory-efficient ways to perform operations such as set membership testing and approximate frequency counting. The library is actively maintained, with its current version being 0.7.0, and receives regular updates including new features, bug fixes, and Python version support changes.","status":"active","version":"0.7.0","language":"en","source_language":"en","source_url":"https://github.com/barrust/pyprobables","tags":["probabilistic data structures","bloom filter","count-min sketch","quotient filter","cuckoo filter","data structures"],"install":[{"cmd":"pip install pyprobables","lang":"bash","label":"PyPI"}],"dependencies":[],"imports":[{"symbol":"BloomFilter","correct":"from probables import BloomFilter"},{"symbol":"CountMinSketch","correct":"from probables import CountMinSketch"},{"symbol":"CuckooFilter","correct":"from probables import CuckooFilter"},{"symbol":"QuotientFilter","correct":"from probables import QuotientFilter"}],"quickstart":{"code":"from probables import BloomFilter\n\n# Initialize a Bloom filter for 100,000 elements with a 0.05 (5%) false positive rate\nblm = BloomFilter(est_elements=100000, false_positive_rate=0.05)\n\n# Add elements\nblm.add('apple')\nblm.add('banana')\nblm.add('orange')\n\n# Check for membership\nprint(f\"Is 'apple' in the filter? {blm.check('apple')}\")\nprint(f\"Is 'grape' in the filter? {blm.check('grape')}\")\n\n# Demonstrate false positive possibility (very low with chosen parameters for this small example)\n# In a real scenario, with many elements, a non-member might occasionally return True.\nif blm.check('nonexistent_fruit'):\n    print(\"Warning: A false positive occurred for 'nonexistent_fruit'.\")","lang":"python","description":"This quickstart demonstrates how to initialize a BloomFilter, add elements to it, and check for element membership. Bloom filters are used for approximate set membership testing, guaranteeing no false negatives but allowing for a configurable rate of false positives."},"warnings":[{"fix":"Ensure that Bloom filters being compared are compatible (e.g., created with the same parameters) or handle the `SimilarityError` exception if comparison is attempted on mismatched filters.","message":"As of v0.7.0, comparing mismatched Bloom filters (e.g., different sizes or hash functions) will now raise a `SimilarityError` instead of returning `None` for comparison operations.","severity":"breaking","affected_versions":">=0.7.0"},{"fix":"Upgrade your Python environment to version 3.10 or newer to use the latest versions of pyprobables.","message":"Python 3.9 support was dropped in v0.7.0. Python 3.8 support was dropped in v0.6.2, and 3.6/3.7 support in v0.5.9. The library now requires Python >=3.10.","severity":"breaking","affected_versions":">=0.6.2, >=0.7.0"},{"fix":"Install a C-optimized hashing library (e.g., `pip install mmh3`) and pass its functions to the probabilistic data structure's constructor via the `hash_function` parameter.","message":"For better raw performance, especially with high data volumes, consider supplying an alternative hashing algorithm compiled in C, such as those from `mmh3` or `pyhash`.","severity":"gotcha","affected_versions":"all"},{"fix":"Carefully estimate the maximum number of elements you expect to add and initialize the data structure with a sufficiently large `est_elements` parameter to maintain the desired false positive rate. Some filters, like `ExpandingBloomFilter`, can auto-expand but come with their own considerations.","message":"Bloom filters and other probabilistic data structures have a predefined or desired false positive rate based on the estimated number of elements (`est_elements`) during initialization. If the actual number of elements added exceeds this estimate, the false positive rate will increase beyond the desired amount.","severity":"gotcha","affected_versions":"all"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[]}