{"id":9428,"library":"zss","title":"Zhang-Shasha: Tree Edit Distance in Python","description":"ZSS is a Python library that implements the Zhang-Shasha algorithm for computing the tree edit distance between two ordered labeled trees. It is currently at version 1.2.0 and, while not frequently updated, provides a stable and specialized tool for comparing tree structures. The library can be extended with custom node formats and distance metrics, and optionally leverages `editdist` and `numpy` for enhanced functionality and performance.","status":"active","version":"1.2.0","language":"en","source_language":"en","source_url":"https://www.github.com/timtadh/zhang-shasha","tags":["tree edit distance","algorithm","graph theory","data structures","string metrics"],"install":[{"cmd":"pip install zss","lang":"bash","label":"Base installation"},{"cmd":"pip install zss editdist numpy","lang":"bash","label":"With optional dependencies for enhanced performance and label comparison"}],"dependencies":[{"reason":"Uses string edit distance to compare node labels rather than a simple equal/not-equal check, improving accuracy for string labels.","package":"editdist","optional":true},{"reason":"Significantly speeds up the library's computations, especially for larger trees. Requires numpy >= 1.7.","package":"numpy","optional":true}],"imports":[{"symbol":"simple_distance","correct":"from zss import simple_distance"},{"symbol":"Node","correct":"from zss import Node"},{"note":"`simple_distance` uses default insert/remove/update costs. For custom or asymmetric costs, use `zss.distance` with explicit cost functions.","wrong":"from zss import simple_distance (for complex cost models)","symbol":"distance","correct":"from zss import distance"}],"quickstart":{"code":"from zss import simple_distance, Node\n\nA = (\n    Node(\"f\")\n    .addkid(Node(\"a\")\n        .addkid(Node(\"h\"))\n        .addkid(Node(\"c\")\n            .addkid(Node(\"l\"))\n        )\n    )\n    .addkid(Node(\"e\"))\n)\n\nB = (\n    Node(\"f\")\n    .addkid(Node(\"a\")\n        .addkid(Node(\"d\"))\n        .addkid(Node(\"c\")\n            .addkid(Node(\"b\"))\n        )\n    )\n    .addkid(Node(\"e\"))\n)\n\ndistance = simple_distance(A, B)\nprint(f\"Tree edit distance: {distance}\")\n# Expected: 2","lang":"python","description":"This example demonstrates how to define two simple trees using the built-in `Node` class and compute their edit distance using `simple_distance()`. The `addkid` method is used to construct the tree structure."},"warnings":[{"fix":"Use `zss.distance(A, B, insert_cost=my_insert_cost, remove_cost=my_remove_cost, update_cost=my_update_cost)`.","message":"The `simple_distance` function assumes default costs for node insertion, removal, and updates (based on label equality). If your application requires specific, non-uniform, or asymmetric costs, you must use the more general `zss.distance` function and provide custom `insert_cost`, `remove_cost`, and `update_cost` functions.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Implement functions `get_children(node)`, `get_label(node)`, and `label_dist(label_a, label_b)` and pass them as arguments: `simple_distance(A, B, get_children=my_get_children, get_label=my_get_label, label_dist=my_label_dist)`.","message":"By default, `zss` only knows how to handle nodes with string labels. When using custom tree structures or non-string labels, you must provide custom `get_children`, `get_label`, and `label_dist` functions to `simple_distance` or `distance`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For optimal performance and accurate string-based label comparisons, ensure these packages are installed: `pip install zss editdist numpy`.","message":"The `editdist` and `numpy` packages are optional 'soft requirements'. Without `editdist`, label comparisons default to a simple equality check (0 if equal, 1 if not). Without `numpy`, performance can be significantly slower, especially for large trees.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure the library is installed: `pip install zss`. If using a virtual environment, activate it before running your script.","cause":"The 'zss' package is not installed in the current Python environment, or the environment where it's installed is not active.","error":"ModuleNotFoundError: No module named 'zss'"},{"fix":"Define functions to extract children and labels from your custom nodes, and a function to compare their labels. Then pass these to the distance calculation: `simple_distance(A, B, get_children=my_get_children_func, get_label=my_get_label_func, label_dist=my_label_distance_func)`.","cause":"When using a custom node class instead of `zss.Node`, you haven't provided `get_children`, `get_label`, and/or `label_dist` functions to `zss.simple_distance` or `zss.distance`. The library tries to access attributes or methods it expects by default from `zss.Node`.","error":"TypeError: object of type 'MyCustomNode' has no len()"},{"fix":"When building trees with `zss.Node`, use the `addkid()` method: `Node('parent').addkid(Node('child'))`.","cause":"The `zss.Node` class uses `addkid()` to add children, not a more generic `add_child()` or `append()` that might be common in other tree implementations.","error":"AttributeError: 'Node' object has no attribute 'add_child'"}]}