jellyfish - Approximate and Phonetic String Matching

1.2.1 · active · verified Sun Apr 05

Jellyfish is a Python library for approximate and phonetic matching of strings. It offers a comprehensive collection of algorithms including Levenshtein, Damerau-Levenshtein, Jaro, and Jaro-Winkler distances for string comparison, alongside phonetic encodings such as American Soundex, Metaphone, NYSIIS, and Match Rating Codex. This makes it an essential tool for tasks like data cleaning, typo correction, and record linkage. The library is actively maintained, with the current version being 1.2.1, and releases typically focus on bug fixes and performance improvements.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates common string comparison algorithms and phonetic encoding functions available in the `jellyfish` library.

import jellyfish

# String comparison
s1 = "jellyfish"
s2 = "smellyfish"

lev_dist = jellyfish.levenshtein_distance(s1, s2)
jaro_sim = jellyfish.jaro_similarity(s1, s2)
dam_lev_dist = jellyfish.damerau_levenshtein_distance("jellyfihs", "jellyfish")

print(f"Levenshtein Distance: {lev_dist}")
print(f"Jaro Similarity: {jaro_sim}")
print(f"Damerau-Levenshtein Distance: {dam_lev_dist}")

# Phonetic encoding
metaphone_code = jellyfish.metaphone("Jellyfish")
soundex_code = jellyfish.soundex("Jellyfish")
nysiis_code = jellyfish.nysiis("Jellyfish")

print(f"Metaphone: {metaphone_code}")
print(f"Soundex: {soundex_code}")
print(f"NYSIIS: {nysiis_code}")

view raw JSON →