{"id":8013,"library":"clickhouse-cityhash","title":"ClickHouse CityHash Bindings","description":"clickhouse-cityhash provides Python bindings for a specific, older version of Google's CityHash algorithm (v1.0.2). This library is primarily used to ensure compatibility with ClickHouse servers, which internally use this particular CityHash version for various hashing operations, including data in its protocol. It is a fork of the broader `python-cityhash` library, specifically tailored for the ClickHouse ecosystem. The current version is 1.0.2.5, and it receives updates for compatibility and bug fixes.","status":"active","version":"1.0.2.5","language":"en","source_language":"en","source_url":"https://github.com/xzkostyan/clickhouse-cityhash","tags":["hashing","non-cryptographic","clickhouse","data-processing","cityhash"],"install":[{"cmd":"pip install clickhouse-cityhash","lang":"bash","label":"Install stable version"}],"dependencies":[],"imports":[{"note":"The functions are exposed directly under the `cityhash` module, not `clickhouse_cityhash`.","wrong":"import clickhouse_cityhash; clickhouse_cityhash.CityHash64(...)","symbol":"CityHash64","correct":"from cityhash import CityHash64"},{"note":"Similar to CityHash64, the 128-bit hash function is directly accessible via the `cityhash` module.","wrong":"import clickhouse_cityhash; clickhouse_cityhash.CityHash128(...)","symbol":"CityHash128","correct":"from cityhash import CityHash128"}],"quickstart":{"code":"from cityhash import CityHash64, CityHash128\n\ndata_string = 'hello world'\ndata_bytes = data_string.encode('utf-8')\n\nhash64 = CityHash64(data_bytes)\nhash128 = CityHash128(data_bytes)\n\nprint(f\"CityHash64 for '{data_string}': {hash64}\")\nprint(f\"CityHash128 for '{data_string}': {hash128}\")\n\n# Hashing an integer (must be converted to bytes for consistent results)\ninteger_data = 123456789\ninteger_bytes = integer_data.to_bytes(8, 'big') # 8 bytes for CityHash64\nhash64_int = CityHash64(integer_bytes)\nprint(f\"CityHash64 for integer {integer_data}: {hash64_int}\")","lang":"python","description":"This quickstart demonstrates how to import and use the `CityHash64` and `CityHash128` functions. It highlights the crucial step of encoding Python strings to bytes before hashing, as CityHash operates on byte strings. It also shows how to hash integers consistently by converting them to a fixed-size byte representation."},"warnings":[{"fix":"Always verify which CityHash version (and thus hash output) is expected by your downstream system. For general non-ClickHouse use cases, consider `python-cityhash` or `farmhash` which implement newer algorithms. Ensure consistent hashing across systems if comparing hash values.","message":"This library implements CityHash v1.0.2. Modern versions of CityHash (and the general `python-cityhash` library) produce different hash values. This library is specifically for compatibility with ClickHouse's internal hashing, not for general-purpose latest CityHash usage.","severity":"breaking","affected_versions":"All versions"},{"fix":"For cryptographic security requirements, use standard library `hashlib` functions (e.g., SHA256, Blake2b).","message":"CityHash is a *non-cryptographic* hash function. It is optimized for speed and good distribution, but it is not designed to be collision-resistant against malicious input. Do NOT use it for security-sensitive applications like password storage, digital signatures, or integrity checks where adversarial input is possible.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always explicitly encode strings to bytes using a consistent encoding (e.g., `my_string.encode('utf-8')`) before passing them to hashing functions.","message":"CityHash functions operate strictly on byte strings (`bytes`), not Python unicode strings (`str`). Passing a `str` directly will result in a `TypeError` or incorrect results if Python implicitly attempts a conversion.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For stream processing or incremental hashing needs, use a library that explicitly supports this feature (e.g., `MetroHash` or `xxHash`).","message":"This implementation of CityHash does not support incremental hashing. It is not suitable for hashing long data streams or data that arrives in chunks, as the entire input must be provided at once.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Convert integers to a fixed-size byte sequence: `integer_value.to_bytes(8, 'big')` (for 64-bit hash, adjust size and endianness as needed).","message":"When hashing integers, convert them to a fixed-size byte representation for consistent and reproducible results across different environments or Python versions. Variable-length byte representations can lead to inconsistent hashes.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Encode the string to bytes first: `CityHash64(my_string.encode('utf-8'))`.","cause":"Attempting to hash a Python `str` object directly with `CityHash64` or `CityHash128`.","error":"TypeError: argument 1 must be bytes, not str"},{"fix":"Ensure you are using `clickhouse-cityhash` for hashing data intended for ClickHouse, as it specifically implements the compatible CityHash v1.0.2 algorithm.","cause":"Using a different CityHash implementation or version (e.g., the more general `python-cityhash` library) which produces different hashes than ClickHouse's internal v1.0.2 CityHash.","error":"Hash mismatch between Python application and ClickHouse server for the same input."},{"fix":"The functions are exposed under the `cityhash` module. Use `from cityhash import CityHash64`.","cause":"Trying to import `CityHash64` or `CityHash128` from a top-level `clickhouse_cityhash` module directly.","error":"ImportError: cannot import name 'CityHash64' from 'clickhouse_cityhash' (...)"}]}