TextDistance

4.6.3 · active · verified Sat Apr 11

TextDistance is a Python library offering over 30 algorithms to compute the similarity or distance between two or more sequences. It provides a common interface for various string metrics, including edit-based, token-based, and phonetic algorithms. The library is actively maintained with frequent updates, with the current version being 4.6.3. [2, 5, 8]

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use the `textdistance` library to calculate various similarity and distance metrics. It shows direct method calls on algorithm objects (e.g., `textdistance.levenshtein.distance`) and also a convenience function for Jaro-Winkler. [2, 3, 6, 7]

import textdistance

# Calculate Levenshtein distance
str1 = "kitten"
str2 = "sitting"
distance = textdistance.levenshtein.distance(str1, str2)
similarity = textdistance.levenshtein.similarity(str1, str2)
normalized_distance = textdistance.levenshtein.normalized_distance(str1, str2)
normalized_similarity = textdistance.levenshtein.normalized_similarity(str1, str2)

print(f"Strings: '{str1}', '{str2}'")
print(f"Levenshtein Distance: {distance}")
print(f"Levenshtein Similarity: {similarity}")
print(f"Levenshtein Normalized Distance: {normalized_distance:.2f}")
print(f"Levenshtein Normalized Similarity: {normalized_similarity:.2f}")

# Example with another algorithm (Jaro-Winkler)
str3 = "martha"
str4 = "marhta"
jaro_winkler_similarity = textdistance.jaro_winkler(str3, str4)
print(f"\nJaro-Winkler Similarity between '{str3}' and '{str4}': {jaro_winkler_similarity:.2f}")

view raw JSON →