Bloom Filter 2
bloom-filter2 is a pure Python Bloom filter module, providing a space-efficient and probabilistic set data structure. It supports mmap, in-memory, and disk-seek backends, offering a balance between memory usage and performance. The library automatically calculates optimal Bloom filter parameters based on user-specified maximum elements and desired false positive rate. It is compatible with CPython 3.x, Pypy, and Jython and is actively maintained.
Warnings
- gotcha Bloom filters are probabilistic data structures. A membership query (`item in bloom`) returning `True` means the item *might* be in the set (with a specified false positive probability), while `False` means it is *definitely not* in the set. False negatives are not possible.
- gotcha The false positive rate of a Bloom filter increases as more elements are added, especially if the number of added elements significantly exceeds the `max_elements` specified during initialization. Over-filling the filter will severely degrade its accuracy and utility.
- breaking The previous `bloom-filter` package (without '2') is unmaintained and should not be used. Users migrating from `bloom-filter` must switch to `bloom-filter2` and update import paths from `from bloom_filter import BloomFilter` to `from bloom_filter2 import BloomFilter`.
- gotcha The efficiency (memory usage) and accuracy (false positive rate) of the Bloom filter are directly determined by the `max_elements` and `error_rate` parameters provided during instantiation. Poor selection can lead to either excessive memory consumption or an unacceptably high false positive rate.
Install
-
pip install bloom-filter2
Imports
- BloomFilter
from bloom_filter2 import BloomFilter
Quickstart
from bloom_filter2 import BloomFilter
# Instantiate BloomFilter with custom settings:
# max_elements is how many elements you expect the filter to hold.
# error_rate defines accuracy (false positive probability).
# You can use defaults with `BloomFilter()` without any arguments.
bloom = BloomFilter(max_elements=10000, error_rate=0.01)
# Test whether the bloom-filter has seen a key:
assert "test-key" not in bloom
# Mark the key as seen
bloom.add("test-key")
# Now check again
assert "test-key" in bloom
# Example with a different item (should be False initially)
assert "another-key" not in bloom