{"id":4362,"library":"jaro-winkler","title":"Jaro-Winkler String Similarity","description":"The `jaro-winkler` library provides Python implementations of the Jaro and Jaro-Winkler string similarity metrics. It allows for comparison of two strings, returning a score from 0 (no match) to 1 (perfect match). The current version is 2.0.3, offering standard and customizable versions of the functions. While not explicitly stated, the project's release cadence appears to be moderate, with major updates occurring over several years.","status":"active","version":"2.0.3","language":"en","source_language":"en","source_url":"https://github.com/richmilne/JaroWinkler.git","tags":["string similarity","jaro-winkler","fuzzy matching","text processing","record linkage"],"install":[{"cmd":"pip install jaro-winkler","lang":"bash","label":"Install stable version"}],"dependencies":[],"imports":[{"note":"The `jaro-winkler` package (with a hyphen) installs a module named `jaro`. There is a separate, similarly named package `jarowinkler` (no hyphen) that provides a `jarowinkler` module and `jaro_winkler_similarity` function. Importing from the wrong module is a common mistake.","wrong":"from jarowinkler import jaro_winkler_similarity","symbol":"jaro_winkler_metric","correct":"from jaro import jaro_winkler_metric"},{"note":"Imports the standard Jaro string metric.","symbol":"jaro_metric","correct":"from jaro import jaro_metric"},{"note":"Imports the Jaro metric matching the reference C code, including typo tables and longer string adjustments.","symbol":"original_metric","correct":"from jaro import original_metric"}],"quickstart":{"code":"import jaro\n\n# Calculate Jaro-Winkler similarity\nscore_winkler = jaro.jaro_winkler_metric('SHACKLEFORD', 'SHACKELFORD')\nprint(f\"Jaro-Winkler Similarity: {score_winkler}\")\n\n# Calculate Jaro similarity\nscore_jaro = jaro.jaro_metric('MARTHA', 'MARHTA')\nprint(f\"Jaro Similarity: {score_jaro}\")","lang":"python","description":"This quickstart demonstrates how to import the `jaro` module and use its `jaro_winkler_metric` and `jaro_metric` functions to calculate string similarity scores. Scores range from 0 (no similarity) to 1 (identical)."},"warnings":[{"fix":"Ensure you `pip install jaro-winkler` if you intend to `import jaro`, or `pip install jarowinkler` if you intend to `import jarowinkler`. Do not mix imports from the two different packages.","message":"There are two distinct Python packages with very similar names: `jaro-winkler` (this library, which imports as `jaro`) and `jarowinkler` (a different, often faster implementation by maxbachmann, which imports as `jarowinkler`). Users often confuse them, leading to `ModuleNotFoundError` or unexpected behavior if `pip install` one but `import` the other.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Be aware of the prefix bias when interpreting similarity scores. If this bias is undesirable, consider alternative string similarity algorithms like Levenshtein distance which treats all character positions equally.","message":"The Jaro-Winkler algorithm, by design, gives a higher weight to matching prefixes. This means strings with a common beginning will naturally score higher, even if other parts of the strings are very different. This behavior is usually desirable for name matching but can be unexpected in other contexts.","severity":"gotcha","affected_versions":"All versions"},{"fix":"This is a theoretical characteristic; for most practical applications, it does not pose a problem. However, if you are building systems that rely on strict metric properties (e.g., in graph theory or certain clustering algorithms), you should be aware of this limitation.","message":"While often referred to as a distance metric, the Jaro-Winkler 'distance' (1 - similarity) does not strictly adhere to the mathematical definition of a metric because it may not satisfy the triangle inequality.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Evaluate both `jaro-winkler` and `jarowinkler` if performance is critical. The `jarowinkler` library explicitly leverages C-API and bitparallelism for speed.","message":"For optimal performance, especially when dealing with very large datasets or requiring integration with tools like RapidFuzz, the `jarowinkler` (no hyphen) package by maxbachmann (which implements the RapidFuzz C-API) may offer significantly faster computation compared to this `jaro-winkler` library.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}