Edlib - Sequence Alignment
Edlib is a lightweight and super-fast C/C++ library for sequence alignment using various edit (Levenshtein) distance algorithms, with official Python bindings. It supports global, semi-global, and local alignment modes and can return distance, locations, or even the full alignment path. The current version is 1.3.9.post1, with releases typically driven by bug fixes and minor improvements rather than a strict schedule.
Common errors
-
TypeError: argument 1 must be str, not bytes
cause You are passing byte strings (`bytes`) instead of Unicode strings (`str`) to `edlib.align()`.fixEnsure your query and target sequences are standard Python strings. Decode byte strings if necessary, e.g., `seq.decode('utf-8')`. -
ValueError: Query and target sequences must be non-empty.
cause You are attempting to align one or more empty strings, which earlier versions of edlib did not handle gracefully, or it's a validation error in more recent versions when explicit validation is enabled.fixEnsure both the query and target sequences passed to `edlib.align()` are non-empty strings. If using an older version of edlib (<1.2.5), consider upgrading. -
RuntimeError: no solution found
cause This typically occurs when the `k` parameter (maximum edit distance) is set too low for the given sequences, meaning no alignment with `editDistance <= k` could be found. It can also occur in very rare edge cases due to internal bugs in older versions (e.g., v1.1.1).fixIncrease the `k` parameter, or remove it entirely if you want to find an alignment regardless of distance. Ensure you are using a recent version of edlib to avoid known internal bugs. -
The program freezes or hangs indefinitely when calling `edlib.align()`.
cause In versions prior to 1.2.7, edlib had a bug that could cause it to freeze when input sequences contained an alphabet of exactly 256 unique characters.fixUpgrade your `edlib` installation to version 1.2.7 or newer. If upgrading isn't an option, avoid input sequences with an exact 256-character alphabet.
Warnings
- gotcha Older versions (pre-1.2.7) could freeze or exhibit incorrect behavior when the input alphabet was exactly 256 unique characters. While fixed, be mindful of extremely large or unusual character sets in older installations.
- gotcha When only the edit distance is needed, explicitly set `task='distance'` for optimal performance. Using `task='locations'` or `task='path'` involves additional computation to traceback the alignment, which is slower.
- gotcha Providing empty strings as query or target sequences in versions prior to 1.2.5 could lead to incorrect results or crashes. Modern versions handle this correctly.
- gotcha Setting the `k` parameter (maximum edit distance) too aggressively low might result in `no solution found` even if a solution exists with a slightly higher distance. This was particularly buggy in v1.1.1 but is still a design consideration.
Install
-
pip install edlib
Imports
- align
from edlib import align
import edlib edlib.align(...)
Quickstart
import edlib
# Global alignment (Needleman-Wunsch-like)
result = edlib.align("apple", "aple", mode="NW", task="distance")
print(f"NW Distance: {result['editDistance']}")
# Semi-global alignment (ends don't cost)
result = edlib.align("apple", "pple", mode="SHW", task="distance")
print(f"SHW Distance: {result['editDistance']}")
# Global alignment with path (more computationally intensive)
result = edlib.align("apple", "apply", mode="NW", task="path")
print(f"NW Distance with path: {result['editDistance']}")
print(f"Alignment path: {result['alignment']}")