{"id":5864,"library":"bloom-filter2","title":"Bloom Filter 2","description":"bloom-filter2 is a pure Python Bloom filter module, providing a space-efficient and probabilistic set data structure. It supports mmap, in-memory, and disk-seek backends, offering a balance between memory usage and performance. The library automatically calculates optimal Bloom filter parameters based on user-specified maximum elements and desired false positive rate. It is compatible with CPython 3.x, Pypy, and Jython and is actively maintained.","status":"active","version":"2.0.0","language":"en","source_language":"en","source_url":"https://github.com/remram44/python-bloom-filter","tags":["probabilistic","set","data structure","bloom filter","memory efficient"],"install":[{"cmd":"pip install bloom-filter2","lang":"bash","label":"Install latest version"}],"dependencies":[],"imports":[{"symbol":"BloomFilter","correct":"from bloom_filter2 import BloomFilter"}],"quickstart":{"code":"from bloom_filter2 import BloomFilter\n\n# Instantiate BloomFilter with custom settings:\n# max_elements is how many elements you expect the filter to hold.\n# error_rate defines accuracy (false positive probability).\n# You can use defaults with `BloomFilter()` without any arguments.\nbloom = BloomFilter(max_elements=10000, error_rate=0.01)\n\n# Test whether the bloom-filter has seen a key:\nassert \"test-key\" not in bloom\n\n# Mark the key as seen\nbloom.add(\"test-key\")\n\n# Now check again\nassert \"test-key\" in bloom\n\n# Example with a different item (should be False initially)\nassert \"another-key\" not in bloom","lang":"python","description":"Initialize a BloomFilter, add elements, and check for membership using the `in` operator. The `max_elements` and `error_rate` parameters control the filter's capacity and false positive probability."},"warnings":[{"fix":"Always account for the possibility of false positives in your application logic, especially when querying items that are not expected to be present.","message":"Bloom filters are probabilistic data structures. A membership query (`item in bloom`) returning `True` means the item *might* be in the set (with a specified false positive probability), while `False` means it is *definitely not* in the set. False negatives are not possible.","severity":"gotcha","affected_versions":"All"},{"fix":"Accurately estimate `max_elements` for your use case and consider re-initializing or creating a new Bloom filter if the number of elements grows beyond expectations. Monitor the actual false positive rate if critical.","message":"The false positive rate of a Bloom filter increases as more elements are added, especially if the number of added elements significantly exceeds the `max_elements` specified during initialization. Over-filling the filter will severely degrade its accuracy and utility.","severity":"gotcha","affected_versions":"All"},{"fix":"Uninstall `bloom-filter` and `pip install bloom-filter2`. Update all import statements to `from bloom_filter2 import BloomFilter`.","message":"The previous `bloom-filter` package (without '2') is unmaintained and should not be used. Users migrating from `bloom-filter` must switch to `bloom-filter2` and update import paths from `from bloom_filter import BloomFilter` to `from bloom_filter2 import BloomFilter`.","severity":"breaking","affected_versions":"< 2.0.0 (for `bloom-filter` package)"},{"fix":"Carefully consider the expected maximum number of elements and the tolerable false positive rate for your application to choose optimal parameters. The library handles the internal bit array size and hash function count based on these inputs.","message":"The efficiency (memory usage) and accuracy (false positive rate) of the Bloom filter are directly determined by the `max_elements` and `error_rate` parameters provided during instantiation. Poor selection can lead to either excessive memory consumption or an unacceptably high false positive rate.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-14T00:00:00.000Z","next_check":"2026-07-13T00:00:00.000Z"}