{"id":5020,"library":"pyfarmhash","title":"Google FarmHash Bindings for Python","description":"Pyfarmhash provides fast Python bindings for Google's FarmHash, a non-cryptographic hashing algorithm optimized for large datasets. It is currently at version 0.4.0 and sees active maintenance, with the latest release on August 27, 2024.","status":"active","version":"0.4.0","language":"en","source_language":"en","source_url":"https://github.com/veelion/python-farmhash","tags":["hashing","farmhash","google","bindings","performance","non-cryptographic"],"install":[{"cmd":"pip install pyfarmhash","lang":"bash","label":"Stable release"}],"dependencies":[{"reason":"Required for building from source on Linux/macOS.","package":"g++","optional":true},{"reason":"Required for building from source on Windows (though pre-built wheels are available).","package":"Microsoft Visual C++ Compiler","optional":true}],"imports":[{"note":"The primary module containing all hashing functions.","symbol":"farmhash","correct":"import farmhash"},{"note":"One of the main 64-bit hashing functions.","symbol":"hash64","correct":"import farmhash\nfarmhash.hash64('input')"},{"note":"Functions like `fingerprint64` expect byte strings as input; passing a regular Python string will raise a TypeError.","wrong":"farmhash.fingerprint64('input')","symbol":"fingerprint64","correct":"import farmhash\nfarmhash.fingerprint64(b'input')"}],"quickstart":{"code":"import farmhash\n\n# Hashing a standard string (utf-8 encoded by default for hash functions)\ntext_input = 'Hello, FarmHash!'\nhash_value_64 = farmhash.hash64(text_input)\nhash_value_32 = farmhash.hash32(text_input)\nprint(f\"Hash64 for '{text_input}': {hash_value_64}\")\nprint(f\"Hash32 for '{text_input}': {hash_value_32}\")\n\n# Hashing a byte string (required for fingerprint functions)\nbytes_input = b'Another test string'\nfingerprint_64 = farmhash.fingerprint64(bytes_input)\nfingerprint_32 = farmhash.fingerprint32(bytes_input)\nprint(f\"Fingerprint64 for '{bytes_input.decode()}': {fingerprint_64}\")\nprint(f\"Fingerprint32 for '{bytes_input.decode()}': {fingerprint_32}\")\n\n# Hashing with a seed\nseeded_hash = farmhash.hash64withseed(text_input, 12345)\nprint(f\"Seeded Hash64 for '{text_input}' with seed 12345: {seeded_hash}\")","lang":"python","description":"This quickstart demonstrates basic usage of `pyfarmhash` for 32-bit and 64-bit hashing, including using seeds and the critical distinction between string and byte string inputs for different function types."},"warnings":[{"fix":"Always encode string inputs to bytes using `.encode('utf-8')` (or another appropriate encoding) before calling byte-specific hashing functions like `farmhash.fingerprint64()`.","message":"Many `pyfarmhash` functions, especially `fingerprint` variants, strictly require byte strings (`bytes`) as input. Passing a regular Python string (`str`) will result in a `TypeError`. Remember to encode strings (e.g., `my_string.encode('utf-8')`) before passing them to these functions.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Prefer `pip install pyfarmhash` to leverage pre-built wheels. If building from source, ensure your C++ compiler and associated build tools are correctly installed and configured for your Python environment. For Windows, consider the detailed (and often 'hacky') instructions in the GitHub README if pre-built wheels are not an option.","message":"When installing `pyfarmhash` from source (not using pre-built wheels), a C++ compiler is required (`g++` on Linux/macOS, Microsoft Visual C++ Compiler on Windows). Setting up the build environment on Windows can be particularly complex.","severity":"gotcha","affected_versions":"All versions (when building from source)"},{"fix":"Use cryptographic hash functions (e.g., from Python's `hashlib` module) for security-critical tasks. FarmHash is suitable for tasks like data deduplication, partitioning, and unique identifier generation in non-security contexts.","message":"FarmHash is a non-cryptographic hash function. This means it is optimized for speed and distribution, but it is not designed to be collision-resistant and should NOT be used for security-sensitive applications like password hashing or data integrity verification where cryptographic strength is needed.","severity":"gotcha","affected_versions":"All versions"},{"fix":"If matching BigQuery's `FARM_FINGERPRINT` output is critical, explicitly convert the `pyfarmhash` output to a signed 64-bit integer using `numpy.uint64(hash_value).astype('int64')`.","message":"Discrepancies can occur when comparing `pyfarmhash` output with Google BigQuery's `FARM_FINGERPRINT` function, particularly regarding the handling of unsigned 64-bit integers vs. signed integers in Python. BigQuery might return a signed representation where `pyfarmhash` returns an unsigned one.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}