Fuzzysearch
Fuzzysearch is a Python library for finding approximate subsequence matches within long texts or data. It uses Levenshtein distance with configurable parameters to efficiently locate patterns even with typos or minor variations. The library is highly optimized, offering C and Cython extensions for performance while providing pure-Python fallbacks. It is currently at version 0.8.1.
Warnings
- breaking Support for older Python versions has been dropped. As of version 0.8.1, fuzzysearch officially supports Python 3.8+ and PyPy 3.9 and 3.10. Older versions (e.g., Python 2.x, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7) are no longer supported.
- gotcha This library is designed for fuzzy *search* (finding approximate *subsequences* within longer texts/data) rather than fuzzy *comparison* of two strings. If you need to compare two strings for similarity (e.g., 'apple' vs 'aple'), consider libraries like FuzzyWuzzy or RapidFuzz.
- gotcha Older versions (prior to 0.7.3) could experience segmentation faults or undefined behavior due to incorrect handling of bytes-like inputs in C extensions. While fixed, this highlights the importance of providing correct input types and updating.
- gotcha The `find_near_matches` function is quite permissive by default, potentially matching unexpected characters (spaces, symbols, numbers) if not constrained. This can lead to false positives when searching for specific words or patterns.
Install
-
pip install fuzzysearch
Imports
- find_near_matches
from fuzzysearch import find_near_matches
- find_near_matches_in_file
from fuzzysearch import find_near_matches_in_file
Quickstart
from fuzzysearch import find_near_matches
# Search for 'PATTERN' with a maximum Levenshtein Distance of 1
matches = find_near_matches('PATTERN', '---PATERN---', max_l_dist=1)
for match in matches:
print(f"Found match: '{match.matched}' at index {match.start}-{match.end} with distance {match.dist}")