String Similarity and Distance Measures

0.2.1 · active · verified Wed Apr 15

Strsimpy is a Python library that provides implementations for various string similarity and distance measures, including popular algorithms like Levenshtein, Jaro-Winkler, N-Gram, Cosine Similarity, and Jaccard Index. It's designed to be straightforward to use for text analysis and data matching tasks. The current version is 0.2.1. Releases are infrequent, typically addressing bug fixes or adding new algorithms.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates calculating Levenshtein distance and Jaro-Winkler similarity between strings. Most algorithms follow a pattern of instantiating a class and then calling a `distance()` or `similarity()` method.

from strsimpy.levenshtein import Levenshtein

s0 = "안녕하세요"
s1 = "안녕하세유"

levenshtein = Levenshtein()
distance = levenshtein.distance(s0, s1)
print(f"Levenshtein distance between '{s0}' and '{s1}': {distance}")

s2 = "apple"
s3 = "aple"
distance2 = levenshtein.distance(s2, s3)
print(f"Levenshtein distance between '{s2}' and '{s3}': {distance2}")

from strsimpy.jaro_winkler import JaroWinkler

jaro_winkler = JaroWinkler()
similarity = jaro_winkler.similarity(s2, s3)
print(f"Jaro-Winkler similarity between '{s2}' and '{s3}': {similarity}")

view raw JSON →