{"id":8241,"library":"jarowinkler","title":"JaroWinkler String Similarity","description":"JaroWinkler is a high-performance Python library for approximate string matching, implementing Jaro and Jaro-Winkler similarity algorithms. Currently at version 2.0.1, it leverages the `rapidfuzz` library for its core implementations, offering significant speed advantages over alternatives. The project maintains an active development cycle, with a focus on optimization and ease of integration.","status":"active","version":"2.0.1","language":"en","source_language":"en","source_url":"https://github.com/maxbachmann/JaroWinkler","tags":["string matching","similarity","fuzzy matching","jaro-winkler","rapidfuzz"],"install":[{"cmd":"pip install jarowinkler","lang":"bash","label":"Install stable release"}],"dependencies":[{"reason":"Core dependency since v2.0.0 for underlying string metric implementations and performance optimizations.","package":"rapidfuzz","optional":false},{"reason":"Required for building from source distribution (sdist), though pre-compiled wheels are typically available.","package":"cmake","optional":true},{"reason":"Recommended build tool for source distributions to optimize compilation speed.","package":"ninja","optional":true}],"imports":[{"symbol":"jarowinkler_similarity","correct":"from jarowinkler import jarowinkler_similarity"},{"symbol":"jaro_similarity","correct":"from jarowinkler import jaro_similarity"},{"note":"Older or other Jaro-Winkler libraries might use 'jarowinkler_metric'. The `jarowinkler` library (maxbachmann) uses 'jarowinkler_similarity'.","wrong":"from jarowinkler import jarowinkler_metric","symbol":"jarowinkler_metric","correct":"from jarowinkler import jarowinkler_similarity"}],"quickstart":{"code":"from jarowinkler import jaro_similarity, jarowinkler_similarity\n\n# Calculate Jaro Similarity\nsim_jaro = jaro_similarity(\"Johnathan\", \"Jonathan\")\nprint(f\"Jaro Similarity: {sim_jaro:.4f}\")\n\n# Calculate Jaro-Winkler Similarity\nsim_jw = jarowinkler_similarity(\"Johnathan\", \"Jonathan\")\nprint(f\"Jaro-Winkler Similarity: {sim_jw:.4f}\")\n\n# Using with a score cutoff\nsim_jw_cutoff = jarowinkler_similarity(\"apple\", \"aple\", score_cutoff=0.9)\nprint(f\"Jaro-Winkler with cutoff (0.9): {sim_jw_cutoff:.4f}\")\n\n# Can also be used with sequences of hashable objects\nlist1 = [\"this\", \"is\", \"an\", \"example\"]\nlist2 = [\"this\", \"is\", \"a\", \"example\"]\nsim_list = jarowinkler_similarity(list1, list2)\nprint(f\"Similarity of lists: {sim_list:.4f}\")","lang":"python","description":"Demonstrates how to calculate Jaro and Jaro-Winkler similarity scores between strings, including the use of an optional `score_cutoff` and its application to sequences of hashable objects."},"warnings":[{"fix":"Upgrade to Python 3.8 or newer, or use `pip install 'jarowinkler<2.0.0'`.","message":"Version 2.0.0 dropped support for Python 3.6 and Python 3.7. Users on these Python versions must either upgrade Python or pin `jarowinkler` to `<2.0.0`.","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Ensure `rapidfuzz` is installed alongside `jarowinkler`. Review performance benchmarks if migrating from older versions.","message":"Since v2.0.0, the library's internal implementations are deduplicated and now rely on `rapidfuzz`. While the API aims to be consistent, `rapidfuzz` is effectively a required runtime dependency. This change might subtly alter behavior or performance characteristics from pre-2.0.0 versions which used standalone C++ implementations.","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Be aware of the prefix bias in Jaro-Winkler. For applications where prefix matching is less critical, consider using Jaro similarity or other string metrics. The `prefix_weight` parameter can be adjusted in `jarowinkler_similarity` (default 0.1) if using `rapidfuzz.distance.JaroWinkler.similarity` directly.","message":"Jaro-Winkler similarity, by design, gives a higher weight to matching prefixes. This can sometimes lead to unexpectedly high similarity scores for strings that share a long common prefix but are otherwise quite different, or lower scores if there's no common prefix, even if the strings are otherwise similar.","severity":"gotcha","affected_versions":"all"},{"fix":"When using `jarowinkler` with sequences, ensure that elements within the sequences are consistently hashable and comparable. If comparing custom objects, verify their `__hash__` and `__eq__` implementations.","message":"The functions `jaro_similarity` and `jarowinkler_similarity` can operate on any sequence of hashable objects, not just strings. While powerful, comparing sequences of mixed types or non-comparable hashables can yield unexpected results or `TypeError`s if `__hash__` or `__eq__` methods are not consistently defined.","severity":"gotcha","affected_versions":"all"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Run `pip install jarowinkler` to install the library.","cause":"The `jarowinkler` library is not installed in the active Python environment.","error":"ModuleNotFoundError: No module named 'jarowinkler'"},{"fix":"The correct function in this library is `jarowinkler_similarity`. Update your code to `from jarowinkler import jarowinkler_similarity` and use `jarowinkler_similarity(str1, str2)`.","cause":"Attempting to use an API call (`jaro_winkler_metric`) from a different Jaro-Winkler library (e.g., `jaro-winkler` or `pyjarowinkler`) that is not part of this specific `jarowinkler` package.","error":"AttributeError: module 'jarowinkler' has no attribute 'jaro_winkler_metric'"},{"fix":"Ensure both arguments passed to `jaro_similarity` or `jarowinkler_similarity` are strings or iterable sequences of hashable objects (e.g., lists of strings/numbers). For example, `jarowinkler_similarity('test', 123)` will fail, it should be `jarowinkler_similarity('test', '123')` or `jarowinkler_similarity('test', ['1','2','3'])`.","cause":"One of the input arguments to `jaro_similarity` or `jarowinkler_similarity` is not a string or a sequence of hashable objects.","error":"TypeError: 'float' object cannot be interpreted as an integer (when passing non-string/non-sequence to similarity function)"},{"fix":"Ensure `prefix_weight` is set to a float between 0.0 and 0.25, inclusive. For example: `jarowinkler_similarity('foo', 'bar', prefix_weight=0.15)`.","cause":"The `prefix_weight` parameter, when used with `jarowinkler_similarity` (or underlying `rapidfuzz` calls), was provided with a value outside its valid range.","error":"ValueError: prefix_weight has to be between 0 and 0.25 (inclusive)"}]}