Hachoir

3.3.0 · active · verified Wed Apr 15

Hachoir is a Python library designed to view and edit binary streams field by field. It represents a binary file as a hierarchical tree of Python objects, enabling detailed analysis and manipulation down to the bit level. The current version is 3.3.0, released on December 12, 2023. The project maintains an active, though not strictly frequent, release cadence, with previous major updates in 2022 and 2020.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to define a simple custom parser, create an in-memory `StringInputStream` from binary data, and then parse and access fields using Hachoir. It illustrates the basic workflow of defining field types and accessing their values.

import io
from hachoir.stream import StringInputStream, LITTLE_ENDIAN
from hachoir.field import Root, UInt8, UInt16, Bytes
from hachoir.parser import Parser

# Define a simple custom parser for demonstration
class SimpleBinaryParser(Parser):
    PARSER_TAGS = {
        "id": "simple_bin",
        "category": "misc",
        "description": "Simple binary format"
    }
    endian = LITTLE_ENDIAN # Specify endianness for the parser

    def createFields(self):
        # A 1-byte header identifier
        yield UInt8(self, "header_byte", "Header identifier")
        # A 2-byte unsigned integer for length (little endian)
        yield UInt16(self, "length", "Length of data section")
        # A data payload whose size is determined by the 'length' field
        yield Bytes(self, "data", self["length"].value, "Data payload")

# Create a dummy binary string:
# - 0xAA (1 byte) for 'header_byte'
# - 0x05 0x00 (2 bytes, little endian representation of 5) for 'length'
# - "hello" (5 bytes) for 'data'
dummy_data = b"\xAA\x05\x00hello"

# Create a StringInputStream from the dummy data
stream = StringInputStream(dummy_data, "simple_data_stream")

# Instantiate the parser with the stream
parser = SimpleBinaryParser(stream)

# Access and print the parsed field values
print(f"Parsed Header Byte: {parser['header_byte'].value} (0x{parser['header_byte'].value:02X})")
print(f"Parsed Length: {parser['length'].value}")
print(f"Parsed Data: {parser['data'].value.decode('ascii')}")

view raw JSON →