SMDA: Static Malware Disassembly Analysis Library
SMDA is a minimalist recursive disassembler library optimized for accurate Control Flow Graph (CFG) recovery, particularly from memory dumps. Built upon Capstone, it currently supports x86/x64 Intel machine code. It processes arbitrary memory dumps (ideally with known base address) to output a structured collection of functions, basic blocks, and instructions, including their respective edges. The library is actively maintained, with the current stable version being 2.5.3.
Warnings
- breaking Version 1.2.0 introduced significant API changes. The `config.py` module was restructured to `smda/SmdaConfig.py`, and the primary disassembly method now returns an `SmdaReport` object instead of a direct JSON output. Direct interaction with results should be done via the `SmdaReport` object, which offers a `toDict()` method for JSON serialization.
- gotcha While PyPI classifiers for older versions might indicate Python 2.7 compatibility, the GitHub README states that SMDA code should be fully compatible with Python 3.8+. It is strongly recommended to use Python 3.8 or newer for stable and supported functionality, as Python 2.7 is end-of-life and may lead to unexpected issues with recent `smda` versions.
- gotcha SMDA has a strict dependency on specific LIEF API versions, as noted by past adjustments (e.g., v1.10.0 adjusted to LIEF 0.12.3 API). Although `setup.py` currently specifies `lief>=0.16.0`, ensure that your installed LIEF version is compatible with your `smda` version to avoid unexpected parsing errors or crashes due to API mismatches.
Install
-
pip install smda
Imports
- Disassembler
from smda.Disassembler import Disassembler
- SmdaReport
from smda.common.SmdaReport import SmdaReport
Quickstart
import os
from smda.Disassembler import Disassembler
from smda.common.SmdaReport import SmdaReport
# Create a dummy file for demonstration purposes
dummy_file_path = "dummy_binary.bin"
# A very simple x64 'ret' instruction (0xc3) as binary content
# In a real scenario, this would be a full executable or memory dump
dummy_binary_content = b"\xc3"
try:
with open(dummy_file_path, "wb") as f:
f.write(dummy_binary_content)
# Initialize the disassembler
disassembler = Disassembler()
# Disassemble the dummy file
# For a real binary, replace dummy_file_path with an actual path, e.g., "/bin/ls"
# For a memory dump, use disassembleBuffer(buffer, base_address)
report: SmdaReport = disassembler.disassembleFile(dummy_file_path)
print(f"\nDisassembly Report for '{dummy_file_path}':")
if report.functions:
print(f"Detected {len(report.functions)} function(s).")
for fn in report.getFunctions():
print(f"Function at 0x{fn.offset:08x}:")
for ins in fn.getInstructions():
print(f" 0x{ins.offset:08x}: {ins.mnemonic} {ins.operands}")
print("-" * 20)
else:
print("No functions detected.")
# The report can be converted to a dictionary for JSON serialization
# json_report = report.toDict()
# print(json_report) # Uncomment to see the full JSON representation
finally:
# Clean up the dummy file
if os.path.exists(dummy_file_path):
os.remove(dummy_file_path)