Bloomfilter-py
Bloomfilter-py is a Python library providing a Bloom filter implementation, notable for its compatibility with Java's Guava library's serialization format. It allows for seamless reading and writing of Bloom filters between Python and Java applications. The current version is 1.1.0, and it has a moderate release cadence, with its latest update in August 2024.
Common errors
-
False positives are too high, or bloom filter seems to always return True.
cause The Bloom filter was initialized with `expected_insertions` too low or `err_rate` too high for the actual number of items inserted, or the `expected_insertions` limit was exceeded.fixRe-evaluate your expected number of insertions and your acceptable error rate. Initialize the `BloomFilter` with a larger `expected_insertions` or a smaller `err_rate`. Remember that increasing capacity or lowering error rate will consume more memory. -
AttributeError: module 'bloomfilter' has no attribute 'BloomFilter'
cause Incorrect import statement for the BloomFilter class.fixThe correct import path is `from bloomfilter import BloomFilter`. -
Data deserialized from another Bloom filter implementation (e.g., Python `pybloom` or a custom C++ implementation) is not recognized or yields incorrect results when loaded by `bloomfilter-py`, or vice-versa.
cause This library is specifically designed for compatibility with Java's Guava Bloom filter serialization format. Other implementations may use different hashing functions, bit array structures, or serialization schemes.fixVerify that both the sender and receiver of the Bloom filter data are using or are compatible with Java Guava's Bloom filter serialization format. If not, you will need to use a different Bloom filter library that provides a common serialization format, or implement custom conversion logic.
Warnings
- gotcha Bloom filters inherently have a false positive rate, meaning `element in bloom_filter` might return True for elements not actually added. There are no false negatives. The `err_rate` parameter controls this trade-off.
- gotcha Standard Bloom filters, including this implementation, do not support deletion of elements. Attempting to remove an element would compromise the integrity of other stored elements by clearing shared bits.
- gotcha This library's primary feature is compatibility with Java's Guava Bloom filter serialization format. Its `dumps()` and `loads()` methods are specifically designed for this. It is unlikely to be compatible with Bloom filters serialized by other Python libraries or non-Guava Java implementations.
- gotcha Exceeding the `expected_insertions` provided during initialization will cause the false positive rate to increase dramatically beyond the specified `err_rate`.
Install
-
pip install bloomfilter-py
Imports
- BloomFilter
from bloomfilter import BloomFilter
Quickstart
from bloomfilter import BloomFilter
# Initialize a Bloom filter with expected insertions and desired error rate
bloom_filter = BloomFilter(expected_insertions=1000, err_rate=0.01)
# Add elements
bloom_filter.put("apple")
bloom_filter.put("banana")
bloom_filter.put("orange")
# Check for membership
print(f"Is 'apple' in filter? { 'apple' in bloom_filter }")
print(f"Is 'grape' in filter? { 'grape' in bloom_filter }")
# Serialize to bytes (Guava compatible)
serialized_data = bloom_filter.dumps()
# Deserialize from bytes
loaded_filter = BloomFilter.loads(serialized_data)
print(f"Is 'banana' in loaded filter? { 'banana' in loaded_filter }")