pybloom-live: Bloom Filter Implementation
pybloom-live is a Python library providing an efficient implementation of the Bloom filter probabilistic data structure. It also offers a Scalable Bloom Filter, which can dynamically grow its capacity. Currently at version 4.0.0, it is a fork of the original `pybloom` project, with improvements like a consistent tightening ratio. It aims to provide fast, space-efficient membership testing for large datasets where a small probability of false positives is acceptable.
Warnings
- breaking Version 3.0.0 dropped support for Python 2.6. Users on older Python 2.x versions might need to use a prior version of pybloom-live or migrate their Python environment.
- gotcha Bloom filters are probabilistic data structures that can produce 'false positives'. This means they might indicate an element is present when it's not, but they will never produce 'false negatives' (they won't say an element is absent if it was actually added).
- gotcha The `BloomFilter` (non-scalable) requires you to pre-define an estimated `capacity` and `error_rate`. If the actual number of elements significantly exceeds the `capacity`, the false positive rate will increase drastically beyond the specified `error_rate`.
- gotcha `pybloom-live` is a fork of the original `pybloom` library. While `pybloom-live` is actively maintained and has improvements, ensure you are importing from `pybloom_live` (e.g., `from pybloom_live import BloomFilter`) to use the correct version and features. Accidental imports from an older `pybloom` might lead to unexpected behavior or missing features.
Install
-
pip install pybloom-live
Imports
- BloomFilter
from pybloom_live import BloomFilter
- ScalableBloomFilter
from pybloom_live import ScalableBloomFilter
Quickstart
from pybloom_live import BloomFilter
# Initialize a Bloom filter with a capacity of 1000 elements
# and an acceptable false positive rate of 0.01 (1%)
bloom = BloomFilter(capacity=1000, error_rate=0.01)
# Add elements
bloom.add("apple")
bloom.add("banana")
bloom.add("orange")
# Check for membership
print(f"Is 'apple' in the filter? {'apple' in bloom}") # Expected: True
print(f"Is 'grape' in the filter? {'grape' in bloom}") # Expected: False
# Note: Due to the probabilistic nature, 'grape' *could* theoretically
# return True with a small probability (false positive), but never False
# if it was actually added.