thefuzz (Fuzzy String Matching)
thefuzz is a Python library for fuzzy string matching, based on Levenshtein distance. It provides a simple API for comparing strings and extracting best matches from collections. The current version is 0.22.1, and it maintains an active development pace with periodic releases.
Warnings
- breaking The library was renamed from `fuzzywuzzy` to `thefuzz`. Direct imports of `fuzzywuzzy` will no longer work, and `python-Levenshtein` is now an optional dependency.
- gotcha Performance degrades significantly without the optional `python-Levenshtein` dependency (often referred to as 'speedup'). The library falls back to a pure Python implementation which is much slower.
- gotcha The `process.extract` and `process.extractOne` functions return tuples, where the first element is the matched string and the second is the score. Be careful when destructuring the results.
- gotcha Different ratio functions (`fuzz.ratio`, `fuzz.partial_ratio`, `fuzz.token_sort_ratio`, `fuzz.token_set_ratio`) are suited for different scenarios. Using the wrong one can lead to unintuitive results.
Install
-
pip install thefuzz -
pip install thefuzz[speedup]
Imports
- fuzz
from thefuzz import fuzz
- process
from thefuzz import process
- fuzzywuzzy
from thefuzz import fuzz, process
Quickstart
from thefuzz import fuzz
from thefuzz import process
# Basic string comparison
score = fuzz.ratio("this is a test", "this is a test!")
print(f"Ratio score: {score}")
# Find the best match in a list
choices = ["apple pie", "grapefruit", "apple tree"]
query = "apple"
best_match, best_score = process.extractOne(query, choices)
print(f"Best match for '{query}': '{best_match}' with score {best_score}")
# Get top N matches
top_matches = process.extract(query, choices, limit=2)
print(f"Top matches for '{query}': {top_matches}")