FuzzyWuzzy

0.18.0 · active · verified Wed Apr 08

FuzzyWuzzy is a Python library that implements fuzzy string matching, often used for comparing the similarity between two strings. It leverages Levenshtein distance to calculate ratios between strings. The current version is 0.18.0. Its release cadence has been infrequent in recent years.

Warnings

gotcha Without the optional `python-levenshtein` dependency, FuzzyWuzzy can be very slow for large datasets or frequent comparisons. The pure Python implementation is significantly less performant.
Fix: Install with `pip install fuzzywuzzy[speedup]` to include the C++ optimized `python-levenshtein` library.
gotcha FuzzyWuzzy's default string preprocessing (e.g., lowercasing, removing non-alphanumeric characters, and stripping whitespace) can sometimes lead to unexpected results if you need to preserve specific case or punctuation for your matching logic.
Fix: Be aware of the `processor` argument in functions like `process.extract` and consider implementing custom preprocessing if the default behavior is not suitable. For `fuzz` functions, you might need to preprocess strings yourself before passing them.
deprecated The library appears to be in a low-maintenance state, with the last release (0.18.0) in 2017. While functional, active development and new features are unlikely.
Fix: Consider alternatives like 'thefuzz' (a maintained fork) or 'rapidfuzz' for actively developed and often more performant solutions if long-term support or advanced features are critical.
gotcha When using `process.extract` or `process.extractOne`, ensure your `choices` list is not empty, as this can lead to errors or unexpected behavior depending on the FuzzyWuzzy version and specific call.
Fix: Always check that the list of choices passed to `process` functions is not empty before making the call.

Install

pip install fuzzywuzzy Basic Installation
pip install fuzzywuzzy[speedup] With C++ Speedup (python-levenshtein)

Imports

fuzz
```
from fuzzywuzzy import fuzz
```
process
```
from fuzzywuzzy import process
```
The 'process' module, containing functions like extract, is a top-level import, not nested under 'fuzz'.

Quickstart

Demonstrates basic usage of `fuzz` for various ratio calculations and `process` for extracting best matches from a list of choices.

from fuzzywuzzy import fuzz
from fuzzywuzzy import process

# Simple Ratio
print(fuzz.ratio("this is a test", "this is a test!"))

# Partial Ratio
print(fuzz.partial_ratio("this is a test", "this is a test!"))

# Token Sort Ratio
print(fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear"))

# Token Set Ratio
print(fuzz.token_set_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear"))

choices = ["apple jack", "apple mac", "apple sauce", "orange juice"]
print(process.extract("apple", choices, scorer=fuzz.ratio))
print(process.extractOne("apple goop", choices))

view raw JSON →