JaroWinkler String Similarity

2.0.1 · active · verified Thu Apr 16

JaroWinkler is a high-performance Python library for approximate string matching, implementing Jaro and Jaro-Winkler similarity algorithms. Currently at version 2.0.1, it leverages the `rapidfuzz` library for its core implementations, offering significant speed advantages over alternatives. The project maintains an active development cycle, with a focus on optimization and ease of integration.

Common errors

Warnings

Install

Imports

Quickstart

Demonstrates how to calculate Jaro and Jaro-Winkler similarity scores between strings, including the use of an optional `score_cutoff` and its application to sequences of hashable objects.

from jarowinkler import jaro_similarity, jarowinkler_similarity

# Calculate Jaro Similarity
sim_jaro = jaro_similarity("Johnathan", "Jonathan")
print(f"Jaro Similarity: {sim_jaro:.4f}")

# Calculate Jaro-Winkler Similarity
sim_jw = jarowinkler_similarity("Johnathan", "Jonathan")
print(f"Jaro-Winkler Similarity: {sim_jw:.4f}")

# Using with a score cutoff
sim_jw_cutoff = jarowinkler_similarity("apple", "aple", score_cutoff=0.9)
print(f"Jaro-Winkler with cutoff (0.9): {sim_jw_cutoff:.4f}")

# Can also be used with sequences of hashable objects
list1 = ["this", "is", "an", "example"]
list2 = ["this", "is", "a", "example"]
sim_list = jarowinkler_similarity(list1, list2)
print(f"Similarity of lists: {sim_list:.4f}")

view raw JSON →