Bashlex: Bash Parser
Bashlex is a Python library providing a parser for bash commands. It can take a bash command string and convert it into an Abstract Syntax Tree (AST), allowing for programmatic inspection and manipulation of shell commands. The current version is 0.18, with releases occurring roughly annually or bi-annually.
Warnings
- gotcha Bashlex is strictly a *syntax parser* for bash commands and does not act as a shell emulator. It will not perform variable expansion (e.g., `$VAR`, `~`), command substitution (e.g., `$(cmd)`), globbing (`*`), or interpret complex shell logic. Users are responsible for implementing these behaviors if needed after parsing.
- gotcha Comments (`# ...`) are parsed as distinct tokens during `bashlex.tokenize()` but are generally *not* included as part of the AST `parts` of command nodes returned by `bashlex.parse()`. If your application requires analyzing or extracting comments, you must use the `bashlex.tokenize()` function directly.
- gotcha The library maintains compatibility with both Python 2.7 and Python 3.5+. While this broad support exists, be cautious about `str`/`bytes` differences between Python 2 and 3 when processing input, especially in mixed environments or when migrating legacy code. Ensure your input strings are consistent (e.g., unicode in Python 3).
Install
-
pip install bashlex
Imports
- parse
from bashlex import parse
- tokenize
from bashlex import tokenize
Quickstart
import bashlex
command_string = "echo 'Hello World' && ls -l $HOME/#my-list.txt"
try:
# Parse the command string into an Abstract Syntax Tree (AST)
tree = bashlex.parse(command_string)
print(f"Parsed command: {command_string}")
print("--- AST Structure ---")
for i, part in enumerate(tree):
print(f"[{i}] Kind: {part.kind}, Value: {part.word if hasattr(part, 'word') else str(part)}")
if hasattr(part, 'parts'):
for p_sub in part.parts:
print(f" - Sub-part Kind: {p_sub.kind}, Word: {p_sub.word if hasattr(p_sub, 'word') else p_sub.value}")
# Example: Tokenize the command to see all tokens, including comments and operators
print("\n--- Tokens (including comments/operators) ---")
tokens = bashlex.tokenize(command_string)
for token in tokens:
print(f"Token: '{token.word}', Pos: ({token.pos[0]},{token.pos[1]}), Kind: {token.kind}")
except bashlex.errors.BashlexError as e:
print(f"Error parsing command: {e}")