Levenshtein String Distance
The `levenshtein` Python C extension module provides highly optimized functions for fast computation of Levenshtein (edit) distance, string similarity, and other related metrics. It is currently at version 0.27.3 and maintains an active release cadence with regular updates to support newer Python versions.
Warnings
- breaking The official package name on PyPI was changed from `python-Levenshtein` to `levenshtein`. While `python-Levenshtein` still exists and depends on the `levenshtein` package, using the new name directly is recommended for clarity and to ensure future compatibility.
- breaking Support for older Python versions is periodically dropped. For instance, Python 3.8 support was removed in version 0.26.0, and Python 3.9 support was removed in version 0.27.2.
- gotcha The `Levenshtein.ratio()` function calculates a normalized 'Indel similarity' where substitutions are implicitly treated as a deletion followed by an insertion (costing 2 edit operations), rather than a single substitution (costing 1). This can lead to unexpected ratio values if users anticipate a different weighting scheme for substitutions.
Install
-
pip install levenshtein -
pip install python-Levenshtein
Imports
- Levenshtein
import Levenshtein
Quickstart
import Levenshtein
string1 = "kitten"
string2 = "sitting"
# Calculate Levenshtein distance
distance = Levenshtein.distance(string1, string2)
print(f"Levenshtein distance between '{string1}' and '{string2}': {distance}")
# Calculate Levenshtein ratio (normalized similarity)
ratio = Levenshtein.ratio(string1, string2)
print(f"Levenshtein ratio between '{string1}' and '{string2}': {ratio:.2f}")
# Get edit operations
edit_ops = Levenshtein.editops(string1, string2)
print(f"Edit operations: {edit_ops}")