rBloom
rBloom is a highly optimized Bloom filter library for Python, implemented in Rust. It provides a fast, simple, and lightweight probabilistic data structure that closely mimics the Python built-in `set` API. Currently at version 1.5.4, it's designed for high-performance set membership testing with low memory footprint, and it sees regular updates, often driven by underlying PyO3 version enhancements.
Common errors
-
TypeError: unhashable type: 'list'
cause Attempting to add an unhashable Python object (like a list or dictionary) to the Bloom filter. Bloom filters, like Python sets, require elements to be hashable.fixEnsure that all objects added to the `Bloom` filter are hashable (e.g., strings, numbers, tuples, immutable custom objects). Convert mutable objects to an immutable representation if necessary before adding them. -
'item' in bf returns True when it shouldn't, or bf1 == bf2 returns False despite having the same elements, especially after loading from bytes or in another process.
cause Using Python's default `hash()` function, which generates different hash values across Python process invocations due to a random salt. This breaks consistency for serialized filters or filters used in distributed systems.fixWhen creating the `Bloom` filter, provide a custom, deterministic hash function (e.g., using `hashlib.sha256` and serializing the object to bytes before hashing) to ensure consistent hashes for persistence and cross-process usage. Example: `bf = Bloom(capacity, error_rate, hash_func=my_stable_hash_function)`. -
ImportError: cannot import name 'Bloom' from 'rbloom' (/path/to/rbloom/__init__.py)
cause The `rbloom` package is not correctly installed, or there's a conflict with another package named `rbloom`.fixVerify `rbloom` is installed with `pip show rbloom`. If not, run `pip install rbloom`. If issues persist, check your Python environment (virtual environment) or try reinstalling in a clean environment.
Warnings
- gotcha When serializing `Bloom` filters (e.g., to bytes) or comparing them across different Python process invocations, you must provide a custom, stable hash function. Python's built-in `hash()` function's salt changes between invocations, leading to inconsistent hashes and incorrect `__contains__` or comparison results for deserialized or cross-process filters. The default `Bloom` filter without a custom hash function is only reliable within a single Python process where object hashes are consistent.
- gotcha For `Bloom` filter set operations (union `|`, intersection `&`, difference `-`, symmetric difference `^`) and comparisons (`issubset`, `issuperset`, `==`, `!=`), all participating filters must have identical parameters (capacity, false positive rate, and the exact same hash function object) to ensure correct behavior.
- gotcha If a pre-built wheel is not available for your platform or Python version, `rbloom` will attempt to build from source. This requires a Rust toolchain to be installed, including `cargo` and `maturin`, which can be a dependency hurdle for some environments.
Install
-
pip install rbloom
Imports
- Bloom
from rbloom import Bloom
Quickstart
from rbloom import Bloom
# Initialize a Bloom filter for 200 items with a 1% false positive rate
bf = Bloom(200, 0.01)
# Add items
bf.add("hello")
bf.add("world")
# Check for membership
print(f"'hello' in bf: {"hello" in bf}")
print(f"'python' in bf: {"python" in bf}")
# Update with multiple items
bf.update(["rust", "fast"])
# Set-like operations
other_bf = Bloom(200, 0.01)
other_bf.add("rust")
union_bf = bf | other_bf # Union of filters
print(f"'rust' in union_bf after union: {"rust" in union_bf}")