APTED Algorithm for Tree Edit Distance
APTED (All Path Tree Edit Distance) is a Python library that implements the state-of-the-art algorithm for computing the tree edit distance between two ordered, labeled trees. It is a pure Python port of the original Java implementation. The current stable version is 1.0.3, with updates primarily focusing on maintenance due to its long-standing stability.
Common errors
-
AttributeError: module 'apted' has no attribute 'StartJVM'
cause Attempting to use a `StartJVM()` function that exists in Java-dependent APTED implementations but not in the pure Python `apted` library.fixRemove any `import jpype` or calls to `apted.StartJVM()`. The Python `apted` library does not require a JVM. -
apted.parser.errors.ParseError: Unexpected character 'X' at position Y
cause The input tree string does not conform to the expected bracket notation. Common issues include unescaped curly braces within labels, unmatched braces, or incorrect nesting.fixCarefully review your tree string for correct bracket notation syntax. Ensure all labels are properly enclosed (or not) and all braces are matched. If you have labels containing '{' or '}', they must be escaped as '\{' or '\}'. -
TypeError: __init__() missing 1 required positional argument: 'config'
cause The `APTED` constructor requires a `config` object (even if it's the default unit cost configuration). You might have forgotten to pass it or tried to initialize `APTED` with only the trees.fixPass a configuration object to `APTED`. You can use the default `apted.config.Config()` for unit costs, or define your own custom class inheriting from `object` (or `Config`) and implement `rename`, `cost_insert`, and `cost_delete` methods.
Warnings
- gotcha The `apted` Python library is a pure Python implementation and does not rely on Java or `jpype`. Older examples or discussions might refer to a `StartJVM()` function, which is not applicable to this Python package and will result in errors.
- gotcha The library primarily supports tree input in a specific 'bracket notation' string format (e.g., `{A{B}{C}}`). Using malformed strings or attempting other formats directly will lead to parsing errors.
- gotcha The `apted` library's primary Python package has not been updated since November 2017, meaning new features or significant performance improvements found in the actively developed Java version might not be present. While stable, development is slow for the Python port.
Install
-
pip install apted
Imports
- APTED
from apted import APTED
- Node
from apted import Node
from apted.nodes import Node
- BracketStringInputParser
from apted.parser import BracketStringInputParser
- StartJVM
import jpype; apted.StartJVM()
Quickstart
from apted import APTED
from apted.nodes import Node
# Define a custom configuration for costs (optional, unit costs are default)
class CustomConfig(object):
def rename(self, node1, node2):
return 0 if node1.label == node2.label else 1
def cost_insert(self, node):
return 1
def cost_delete(self, node):
return 1
# Create trees using the bracket notation string format
tree1 = Node.from_string('{a{b}{c}}')
tree2 = Node.from_string('{a{b{d}}}')
# Initialize APTED with the two trees and the custom config
apted = APTED(tree1, tree2, CustomConfig())
# Compute the tree edit distance
distance = apted.compute_edit_distance()
print(f"Tree 1: {tree1.root.to_string()}")
print(f"Tree 2: {tree2.root.to_string()}")
print(f"Edit distance: {distance}")