{"id":3293,"library":"textdistance","title":"TextDistance","description":"TextDistance is a Python library offering over 30 algorithms to compute the similarity or distance between two or more sequences. It provides a common interface for various string metrics, including edit-based, token-based, and phonetic algorithms. The library is actively maintained with frequent updates, with the current version being 4.6.3. [2, 5, 8]","status":"active","version":"4.6.3","language":"en","source_language":"en","source_url":"https://github.com/orsinium/textdistance","tags":["text analysis","string similarity","distance metrics","NLP","fuzzy matching"],"install":[{"cmd":"pip install textdistance","lang":"bash","label":"Basic installation"},{"cmd":"pip install textdistance[extras]","lang":"bash","label":"Installation with optional faster dependencies (recommended for production)"}],"dependencies":[{"reason":"Optional, used for optimized performance in some algorithms.","package":"numpy","optional":true},{"reason":"Optional, provides faster implementations for several algorithms. Included with '[extras]'. [4, 5, 12]","package":"rapidfuzz","optional":true}],"imports":[{"note":"Most algorithms are exposed as attributes of the `textdistance` module, providing both `distance` and `similarity` methods directly. [2, 3]","symbol":"levenshtein","correct":"import textdistance\ndistance = textdistance.levenshtein.distance('text', 'test')"},{"note":"Algorithms can also be imported as classes for custom initialization parameters, though direct attribute access is common for default settings. [3, 14]","symbol":"JaroWinkler","correct":"from textdistance import JaroWinkler\njw = JaroWinkler()\ndistance = jw.distance('martha', 'marhta')"}],"quickstart":{"code":"import textdistance\n\n# Calculate Levenshtein distance\nstr1 = \"kitten\"\nstr2 = \"sitting\"\ndistance = textdistance.levenshtein.distance(str1, str2)\nsimilarity = textdistance.levenshtein.similarity(str1, str2)\nnormalized_distance = textdistance.levenshtein.normalized_distance(str1, str2)\nnormalized_similarity = textdistance.levenshtein.normalized_similarity(str1, str2)\n\nprint(f\"Strings: '{str1}', '{str2}'\")\nprint(f\"Levenshtein Distance: {distance}\")\nprint(f\"Levenshtein Similarity: {similarity}\")\nprint(f\"Levenshtein Normalized Distance: {normalized_distance:.2f}\")\nprint(f\"Levenshtein Normalized Similarity: {normalized_similarity:.2f}\")\n\n# Example with another algorithm (Jaro-Winkler)\nstr3 = \"martha\"\nstr4 = \"marhta\"\njaro_winkler_similarity = textdistance.jaro_winkler(str3, str4)\nprint(f\"\\nJaro-Winkler Similarity between '{str3}' and '{str4}': {jaro_winkler_similarity:.2f}\")","lang":"python","description":"This quickstart demonstrates how to use the `textdistance` library to calculate various similarity and distance metrics. It shows direct method calls on algorithm objects (e.g., `textdistance.levenshtein.distance`) and also a convenience function for Jaro-Winkler. [2, 3, 6, 7]"},"warnings":[{"fix":"Migrate to `rapidfuzz` or other external libraries for performance, or use `textdistance`'s pure Python implementations.","message":"The `abydos` library support was dropped in version 4.6.0. If your code relied on the `textdistance` integration with `abydos`, it will break.","severity":"breaking","affected_versions":"4.6.0 and later"},{"fix":"Upgrade your Python environment to Python 3.6 or newer.","message":"Python 2 support was dropped in version 4.2.0. The library now explicitly supports Python 3.6+.","severity":"breaking","affected_versions":"4.2.0 and later"},{"fix":"Install with `pip install textdistance[extras]` to leverage faster C-based implementations from external libraries. The library automatically prioritizes faster external libraries if found. [4, 14]","message":"For optimal performance, especially in production environments, it is highly recommended to install `textdistance` with `[extras]` (e.g., `pip install textdistance[extras]`). Without these optional dependencies (like `rapidfuzz` and `numpy`), the pure Python implementations are significantly slower. [5, 7, 10]","severity":"gotcha","affected_versions":"All versions where external libraries provide faster implementations"},{"fix":"Verify that your code correctly handles integer return types for Levenshtein distance calculations.","message":"The `Levenstein` algorithm was fixed in version 4.6.2 to ensure its return type is consistently `int`. If your application implicitly handled non-integer return values for Levenshtein distance prior to this version, its behavior might subtly change.","severity":"gotcha","affected_versions":"4.6.2 and later"},{"fix":"If you experience unexpected performance or behavior, inspect the `libraries.json` file in the `textdistance` package directory or use the `external=False` argument when instantiating an algorithm class to force pure Python implementation.","message":"By default, `textdistance` may try to use external libraries (like `rapidfuzz`) if they are installed and provide faster implementations for a given algorithm. This behavior is controlled by an internal `libraries.json` file. If you need to explicitly control which implementation is used or troubleshoot performance, be aware of this mechanism and the `external` argument. [4, 14]","severity":"gotcha","affected_versions":"All versions with external library support"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}