FuzzyWuzzy
FuzzyWuzzy is a Python library that implements fuzzy string matching, often used for comparing the similarity between two strings. It leverages Levenshtein distance to calculate ratios between strings. The current version is 0.18.0. Its release cadence has been infrequent in recent years.
Warnings
- gotcha Without the optional `python-levenshtein` dependency, FuzzyWuzzy can be very slow for large datasets or frequent comparisons. The pure Python implementation is significantly less performant.
- gotcha FuzzyWuzzy's default string preprocessing (e.g., lowercasing, removing non-alphanumeric characters, and stripping whitespace) can sometimes lead to unexpected results if you need to preserve specific case or punctuation for your matching logic.
- deprecated The library appears to be in a low-maintenance state, with the last release (0.18.0) in 2017. While functional, active development and new features are unlikely.
- gotcha When using `process.extract` or `process.extractOne`, ensure your `choices` list is not empty, as this can lead to errors or unexpected behavior depending on the FuzzyWuzzy version and specific call.
Install
-
pip install fuzzywuzzy -
pip install fuzzywuzzy[speedup]
Imports
- fuzz
from fuzzywuzzy import fuzz
- process
from fuzzywuzzy import process
Quickstart
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
# Simple Ratio
print(fuzz.ratio("this is a test", "this is a test!"))
# Partial Ratio
print(fuzz.partial_ratio("this is a test", "this is a test!"))
# Token Sort Ratio
print(fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear"))
# Token Set Ratio
print(fuzz.token_set_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear"))
choices = ["apple jack", "apple mac", "apple sauce", "orange juice"]
print(process.extract("apple", choices, scorer=fuzz.ratio))
print(process.extractOne("apple goop", choices))