Hachoir
Hachoir is a Python library designed to view and edit binary streams field by field. It represents a binary file as a hierarchical tree of Python objects, enabling detailed analysis and manipulation down to the bit level. The current version is 3.3.0, released on December 12, 2023. The project maintains an active, though not strictly frequent, release cadence, with previous major updates in 2022 and 2020.
Warnings
- breaking Hachoir 3.x is a significant rewrite, dropping Python 2 support entirely. Code written for Hachoir 2.x (Python 2) will not run on Hachoir 3.x. Additionally, earlier Hachoir sub-packages (e.g., `hachoir-core`, `hachoir-parser`) were merged into a single `hachoir` module.
- breaking Hachoir 3.x requires Python 3.6 or newer. Running on older Python 3 versions (e.g., 3.4, 3.5) will result in errors.
- gotcha Hachoir operates at the bit level, not byte level, for addresses and sizes. This means that functions expecting lengths often require values in bits (e.g., `8 * bytes`) rather than raw byte counts, which can lead to off-by-eight errors or unexpected behavior if not accounted for.
- gotcha Hachoir uses lazy loading; fields are not fully parsed, and their values are not read until they are explicitly accessed (e.g., `field.value`). This design makes opening large files very fast but can be surprising for users expecting an immediate, complete parse tree.
Install
-
pip install hachoir -
pip install hachoir[urwid] -
pip install hachoir[wx]
Imports
- createParser
from hachoir.parser import createParser
- extractMetadata
from hachoir.metadata import extractMetadata
- StringInputStream
from hachoir.stream import StringInputStream
- FileInputStream
from hachoir.stream import FileInputStream
- LITTLE_ENDIAN
from hachoir.core.endian import LITTLE_ENDIAN
from hachoir.stream import LITTLE_ENDIAN
Quickstart
import io
from hachoir.stream import StringInputStream, LITTLE_ENDIAN
from hachoir.field import Root, UInt8, UInt16, Bytes
from hachoir.parser import Parser
# Define a simple custom parser for demonstration
class SimpleBinaryParser(Parser):
PARSER_TAGS = {
"id": "simple_bin",
"category": "misc",
"description": "Simple binary format"
}
endian = LITTLE_ENDIAN # Specify endianness for the parser
def createFields(self):
# A 1-byte header identifier
yield UInt8(self, "header_byte", "Header identifier")
# A 2-byte unsigned integer for length (little endian)
yield UInt16(self, "length", "Length of data section")
# A data payload whose size is determined by the 'length' field
yield Bytes(self, "data", self["length"].value, "Data payload")
# Create a dummy binary string:
# - 0xAA (1 byte) for 'header_byte'
# - 0x05 0x00 (2 bytes, little endian representation of 5) for 'length'
# - "hello" (5 bytes) for 'data'
dummy_data = b"\xAA\x05\x00hello"
# Create a StringInputStream from the dummy data
stream = StringInputStream(dummy_data, "simple_data_stream")
# Instantiate the parser with the stream
parser = SimpleBinaryParser(stream)
# Access and print the parsed field values
print(f"Parsed Header Byte: {parser['header_byte'].value} (0x{parser['header_byte'].value:02X})")
print(f"Parsed Length: {parser['length'].value}")
print(f"Parsed Data: {parser['data'].value.decode('ascii')}")