CityHash and FarmHash Python Bindings
CityHash is a family of fast non-cryptographic hash functions for strings, originally developed by Google. FarmHash is a successor designed for improved performance and collision resistance on modern CPUs. This library, `python-cityhash`, provides Python bindings for both CityHash and FarmHash, enabling high-performance hashing in Python applications. It is currently at version 0.4.10 and receives updates for Python version compatibility and Cython build fixes.
Warnings
- breaking Version 0.4.0 dropped support for Python 2. Projects targeting Python 2 must use an older version of the library (e.g., <0.4.0).
- gotcha CityHash and FarmHash functions operate on byte strings, not Python unicode strings. Attempting to hash a string directly will result in a TypeError or incorrect results. Always encode strings to bytes (e.g., `my_string.encode('utf-8')`) before passing them to hashing functions.
- gotcha When hashing integers, convert them to a fixed-size byte representation (e.g., 8 bytes for CityHash64) for consistent and reproducible results. Variable-length byte representations can lead to inconsistent hashes.
- gotcha This implementation of CityHash and FarmHash does not support incremental hashing. They are not suitable for hashing long character streams or data that arrives in chunks. For incremental hashing, consider libraries like MetroHash or xxHash.
- gotcha CityHash and FarmHash are *non-cryptographic* hash functions. They are optimized for speed and good distribution, but they are not designed to be collision-resistant and should NOT be used for security-sensitive applications like password storage or digital signatures.
- gotcha When hashing NumPy arrays or other objects exposing the Python Buffer Protocol, ensure the array is contiguous in memory. Non-contiguous arrays might lead to unexpected results.
Install
-
pip install cityhash
Imports
- CityHash32
from cityhash import CityHash32
- CityHash64
from cityhash import CityHash64
- CityHash128
from cityhash import CityHash128
- FarmHash32
from farmhash import FarmHash32
- FarmHash64
from farmhash import FarmHash64
- FarmHash128
from farmhash import FarmHash128
- Fingerprint128
from farmhash import Fingerprint128
- CityHashCrc128
from cityhash.cityhashcrc import CityHashCrc128
- CityHashCrc256
from cityhash.cityhashcrc import CityHashCrc256
Quickstart
import cityhash
import farmhash
# Hashing a string (must be encoded to bytes)
data_string = "hello world"
hashed_bytes = cityhash.CityHash64(data_string.encode('utf-8'))
print(f"CityHash64 of '{data_string}': {hashed_bytes}")
# Hashing bytes directly
data_bytes = b"another example"
hashed_bytes_128 = cityhash.CityHash128(data_bytes)
print(f"CityHash128 of '{data_bytes.decode()}': {hashed_bytes_128}")
# Hashing with FarmHash
farm_hash_64 = farmhash.FarmHash64(b"farmhash test")
print(f"FarmHash64 of 'farmhash test': {farm_hash_64}")
# Hashing an integer (must be converted to fixed-size bytes)
int_data = 123456789
hashed_int = cityhash.CityHash64(int_data.to_bytes(8, 'big'))
print(f"CityHash64 of integer {int_data}: {hashed_int}")