{"id":8664,"library":"spark-parser","title":"SPARK Parser Toolkit","description":"SPARK is a lightweight, pure-Python Earley-Algorithm context-free grammar parser toolkit. It enables developers to build parsers and scanners for custom languages or data formats using grammar rules defined as Python docstrings. The current version is 1.9.0, with releases occurring periodically to address Python compatibility and improve internal mechanics.","status":"active","version":"1.9.0","language":"en","source_language":"en","source_url":"https://github.com/rocky/python-spark/","tags":["parser","grammar","earley","compiler","toolkit","lexer"],"install":[{"cmd":"pip install spark-parser","lang":"bash","label":"Install stable version"}],"dependencies":[],"imports":[{"note":"The package name is `spark-parser`, leading to the import path `spark_parser`. Many users mistakenly try to import from `spark` (without `_parser`), possibly confusing it with Apache Spark.","wrong":"from spark import GenericParser","symbol":"GenericParser","correct":"from spark_parser import GenericParser"},{"symbol":"GenericScanner","correct":"from spark_parser import GenericScanner"}],"quickstart":{"code":"from spark_parser import GenericParser, GenericScanner\n\n# 1. Define your scanner (lexer) by subclassing GenericScanner\nclass SimpleCalcScanner(GenericScanner):\n    def tokenize(self, input_string):\n        tokens = []\n        i = 0\n        while i < len(input_string):\n            char = input_string[i]\n            if char.isspace():\n                i += 1\n                continue\n            if char.isdigit():\n                num_str = \"\"\n                while i < len(input_string) and input_string[i].isdigit():\n                    num_str += input_string[i]\n                    i += 1\n                tokens.append(('NUMBER', int(num_str)))\n            elif char in \"+-*/()\":\n                tokens.append((char, char))\n                i += 1\n            else:\n                raise ValueError(f\"Invalid character: {char}\")\n        return tokens\n\n# 2. Define your parser (grammar rules) by subclassing GenericParser\nclass SimpleCalcParser(GenericParser):\n    def __init__(self, start_symbol='expr'):\n        GenericParser.__init__(self, start_symbol)\n\n    # Define grammar rules using docstrings for methods starting with 'p_'\n    def p_expr_add(self, args):\n        '''\n        expr ::= expr + term\n        '''\n        return args[0] + args[2]\n\n    def p_expr_term(self, args):\n        '''\n        expr ::= term\n        '''\n        return args[0]\n\n    def p_term_num(self, args):\n        '''\n        term ::= NUMBER\n        '''\n        return args[0]\n\n# 3. Instantiate scanner and parser, then tokenize and parse\nscanner = SimpleCalcScanner()\nparser = SimpleCalcParser()\n\ntext_to_parse = \"10 + 5\"\ntokens = scanner.tokenize(text_to_parse)\nresult = parser.parse(tokens)\n\n# print(f\"Parsed result for '{text_to_parse}': {result}\") # Expected: 15\nassert result == 15, \"Parsing failed!\"","lang":"python","description":"This quickstart demonstrates how to define a simple arithmetic scanner and parser using `GenericScanner` and `GenericParser`. It tokenizes an input string and then parses it according to the defined grammar rules to calculate the result."},"warnings":[{"fix":"Review any code that directly interacts with the internal `BuildTree` class. For most users, this change is a beneficial performance and stability improvement, fixing a common `RecursionError`.","message":"Starting with version 1.9.0, the internal `BuildTree` mechanism was rewritten from recursive to iterative. While this fixes `RecursionError` for large trees, it's a significant internal change that might subtly affect highly specialized code directly interacting with `BuildTree`'s structure or performance characteristics.","severity":"gotcha","affected_versions":"1.9.0+"},{"fix":"It is highly recommended to use Python 3.7.4 or newer with `spark-parser`. Consult the official GitHub releases for specific version compatibility notes if targeting older Python environments.","message":"SPARK parser's compatibility with very old Python versions (e.g., <3.7) may be inconsistent or require specific `spark-parser` versions. Recent releases (1.9.0+) focus on a 'modern Python style' (e.g., type annotations, `pyproject.toml`).","severity":"gotcha","affected_versions":"all"},{"fix":"Thoroughly test your grammar with diverse inputs. `spark-parser` includes a debug mode; setting `parser._debug = 1` (or higher, up to 5) in your `GenericParser` subclass instance can provide detailed output on parsing steps and rule application, aiding in grammar debugging.","message":"Incorrectly defined grammar rules (e.g., ambiguity, infinite recursion, unreachable productions) can lead to unexpected parsing results, `SyntaxError` exceptions, or infinite loops.","severity":"gotcha","affected_versions":"all"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Change your import statements from `from spark import ...` to `from spark_parser import ...`.","cause":"Attempting to import from `spark` instead of `spark_parser`.","error":"ModuleNotFoundError: No module named 'spark'"},{"fix":"Upgrade to `spark-parser` version 1.9.0 or higher. If the issue persists with a very complex grammar, review your grammar for excessively deep or recursive rules that could contribute to the problem.","cause":"Prior to version 1.9.0, `spark-parser`'s internal `BuildTree` used recursion, which could hit Python's recursion limit with large or deeply nested parse trees. Complex grammars can also indirectly lead to this.","error":"RecursionError: maximum recursion depth exceeded"},{"fix":"Review your `Scanner`'s `tokenize` method. Ensure it handles all possible characters in the input, including whitespace, numbers, symbols, and any other valid tokens. Add rules or skip unknown characters explicitly.","cause":"Your `GenericScanner` subclass encountered a character in the input string that it doesn't have a rule to tokenize, often due to missing whitespace handling, unhandled special characters, or malformed input.","error":"ValueError: Invalid character: '<char>'"}]}