Distance
The `distance` library provides utilities for comparing arbitrary sequences, implementing metrics such as Levenshtein, Hamming, Jaccard, and Sorensen distances. It offers both pure Python implementations and C extensions for performance. The library's last release was in 2013, indicating it is no longer actively maintained.
Warnings
- breaking The `distance` library has not been updated since November 2013 (version 0.1.3). It is considered unmaintained, and may have compatibility issues or unaddressed bugs with modern Python versions (beyond Python 3.3, which it nominally supported) and newer system architectures. Other libraries like `textdistance` have explicitly removed support for 'distance' due to its unmaintained status.
- gotcha Installing the optional C extension requires a C compiler and Python development headers to be present on your system. Without these, the `pip install` command with `--with-c` will fail. The benefit of the C extension for performance might be negligible or detrimental on modern Python versions due to potential inefficiencies in old C API usage.
- gotcha The library's changelog from 2013 mentions switching 'back to using the to-be-deprecated Python unicode api' for Python 2.7+ compatibility and fixing 'variable interversions in (C) levenshtein which produced sometimes strange results'. This indicates potential historical issues with string handling, especially Unicode, which could resurface or behave unexpectedly in certain edge cases with modern Python 3.x string types.
Install
-
pip install Distance -
pip install Distance --global-option="--with-c"
Imports
- levenshtein
import distance distance.levenshtein('string1', 'string2') - hamming
import distance distance.hamming('string1', 'string2') - jaccard
import distance distance.jaccard('seq1', 'seq2') - sorensen
import distance distance.sorensen('seq1', 'seq2')
Quickstart
import distance
# Levenshtein Distance
word1 = "kitten"
word2 = "sitting"
lev_dist = distance.levenshtein(word1, word2)
print(f"Levenshtein distance between '{word1}' and '{word2}': {lev_dist}")
# Hamming Distance (for sequences of equal length)
seq1 = "karolin"
seq2 = "kathrin"
ham_dist = distance.hamming(seq1, seq2)
print(f"Hamming distance between '{seq1}' and '{seq2}': {ham_dist}")
# Jaccard Distance
set1 = "apple"
set2 = "apply"
jacc_dist = distance.jaccard(set1, set2)
print(f"Jaccard distance between '{set1}' and '{set2}': {jacc_dist}")