Fingerprints
The `fingerprints` library is a utility for generating stable and deterministic hashes (fingerprints) for entities based on their identifying attributes like names, addresses, and identifiers. It's commonly used in data matching and deduplication scenarios, particularly within the 'opensanctions' ecosystem. The current version is 1.3.1, and its release cadence is irregular, typically corresponding to bug fixes or minor feature enhancements.
Common errors
-
ModuleNotFoundError: No module named 'fingerprints'
cause The `fingerprints` library is not installed in your Python environment or is not accessible from the current path.fixInstall the library using pip: `pip install fingerprints` -
TypeError: generate() missing 1 required positional argument: 'data'
cause The `generate` function was called without any keyword arguments. It expects identifying attributes to be passed as `key=value` pairs (e.g., `generate(name="Alice", country="US")`).fixProvide identifying attributes as keyword arguments, for example: `from fingerprints import generate; generate(name="Example Entity", country="ZZ")`. -
ValueError: Could not normalize country: <INVALID_CODE>
cause The `country` argument provided to `generate` could not be normalized by the underlying `normality` library. This typically happens when an invalid or unresolvable country code or name is passed.fixEnsure that country values are valid ISO 3166-1 alpha-2 codes (e.g., 'US', 'DE', 'GB') or common country names that `normality` can resolve. Consult `normality`'s documentation for supported country formats.
Warnings
- gotcha Input keys for `generate` are specific and others are ignored. The `generate` function processes a predefined set of keys (e.g., `name`, `address`, `country`, `id_number`, `email`). Providing other, unrecognized keys will not cause an error but will be silently ignored, potentially leading to less distinct or identical fingerprints for entities that differ only by ignored attributes.
- gotcha Fingerprints are deterministic hashes, not fuzzy matches. This library generates exact cryptographic hashes (SHA1) based on *normalized* input. It is not designed for fuzzy matching (e.g., matching 'John Doe' to 'Jon Doh'). Even minor differences in input (like typos or extra spaces not handled by normalization) will result in completely different fingerprints.
- gotcha Dependency on `normality` version and its behavior. The `fingerprints` library relies heavily on `normality` for data cleaning and standardization. Changes, bug fixes, or new normalization rules in newer versions of `normality` can subtly alter how data is processed, potentially leading to different fingerprints for the same input across different `normality` versions or environments.
Install
-
pip install fingerprints
Imports
- generate
from fingerprints import generate
Quickstart
from fingerprints import generate
# Generate a fingerprint for a person
person_fp = generate(name="Angela Merkel", country="de", birth_date="1954-07-17")
print(f"Person Fingerprint: {person_fp}")
# Generate a fingerprint for an organization with an address
org_fp = generate(
name="Global Corp Inc.",
address="123 Main Street, Cityville, Countryland",
country="us",
url="http://globalcorp.com"
)
print(f"Organization Fingerprint: {org_fp}")
# Fingerprints are deterministic for identical, normalized inputs
same_person_fp = generate(name="angela merkel", country="germany", birth_date="1954-07-17")
print(f"Same Person Fingerprint: {same_person_fp}")
assert person_fp == same_person_fp