Google FarmHash Bindings for Python
Pyfarmhash provides fast Python bindings for Google's FarmHash, a non-cryptographic hashing algorithm optimized for large datasets. It is currently at version 0.4.0 and sees active maintenance, with the latest release on August 27, 2024.
Warnings
- gotcha Many `pyfarmhash` functions, especially `fingerprint` variants, strictly require byte strings (`bytes`) as input. Passing a regular Python string (`str`) will result in a `TypeError`. Remember to encode strings (e.g., `my_string.encode('utf-8')`) before passing them to these functions.
- gotcha When installing `pyfarmhash` from source (not using pre-built wheels), a C++ compiler is required (`g++` on Linux/macOS, Microsoft Visual C++ Compiler on Windows). Setting up the build environment on Windows can be particularly complex.
- gotcha FarmHash is a non-cryptographic hash function. This means it is optimized for speed and distribution, but it is not designed to be collision-resistant and should NOT be used for security-sensitive applications like password hashing or data integrity verification where cryptographic strength is needed.
- gotcha Discrepancies can occur when comparing `pyfarmhash` output with Google BigQuery's `FARM_FINGERPRINT` function, particularly regarding the handling of unsigned 64-bit integers vs. signed integers in Python. BigQuery might return a signed representation where `pyfarmhash` returns an unsigned one.
Install
-
pip install pyfarmhash
Imports
- farmhash
import farmhash
- hash64
import farmhash farmhash.hash64('input') - fingerprint64
import farmhash farmhash.fingerprint64(b'input')
Quickstart
import farmhash
# Hashing a standard string (utf-8 encoded by default for hash functions)
text_input = 'Hello, FarmHash!'
hash_value_64 = farmhash.hash64(text_input)
hash_value_32 = farmhash.hash32(text_input)
print(f"Hash64 for '{text_input}': {hash_value_64}")
print(f"Hash32 for '{text_input}': {hash_value_32}")
# Hashing a byte string (required for fingerprint functions)
bytes_input = b'Another test string'
fingerprint_64 = farmhash.fingerprint64(bytes_input)
fingerprint_32 = farmhash.fingerprint32(bytes_input)
print(f"Fingerprint64 for '{bytes_input.decode()}': {fingerprint_64}")
print(f"Fingerprint32 for '{bytes_input.decode()}': {fingerprint_32}")
# Hashing with a seed
seeded_hash = farmhash.hash64withseed(text_input, 12345)
print(f"Seeded Hash64 for '{text_input}' with seed 12345: {seeded_hash}")