{"id":6663,"library":"hachoir","title":"Hachoir","description":"Hachoir is a Python library designed to view and edit binary streams field by field. It represents a binary file as a hierarchical tree of Python objects, enabling detailed analysis and manipulation down to the bit level. The current version is 3.3.0, released on December 12, 2023. The project maintains an active, though not strictly frequent, release cadence, with previous major updates in 2022 and 2020.","status":"active","version":"3.3.0","language":"en","source_language":"en","source_url":"https://github.com/vstinner/hachoir","tags":["binary parsing","file analysis","metadata extraction","reverse engineering","forensics","data structures"],"install":[{"cmd":"pip install hachoir","lang":"bash","label":"Basic Installation"},{"cmd":"pip install hachoir[urwid]","lang":"bash","label":"For hachoir-urwid CLI tool"},{"cmd":"pip install hachoir[wx]","lang":"bash","label":"For hachoir-wx GUI tool (Linux users may need to install wxPython via system package manager)"}],"dependencies":[{"reason":"Hachoir 3.x requires Python 3.6 or newer.","package":"python3","optional":false},{"reason":"Required for the 'hachoir-urwid' command-line interface tool, which provides a curses-based UI for binary file exploration.","package":"urwid","optional":true},{"reason":"Required for the 'hachoir-wx' graphical user interface tool, which provides a GUI for binary file exploration. Installation varies by OS.","package":"wxPython","optional":true}],"imports":[{"symbol":"createParser","correct":"from hachoir.parser import createParser"},{"symbol":"extractMetadata","correct":"from hachoir.metadata import extractMetadata"},{"symbol":"StringInputStream","correct":"from hachoir.stream import StringInputStream"},{"symbol":"FileInputStream","correct":"from hachoir.stream import FileInputStream"},{"note":"While `hachoir.core.endian` is the source, examples commonly import `LITTLE_ENDIAN` directly from `hachoir.stream`.","wrong":"from hachoir.core.endian import LITTLE_ENDIAN","symbol":"LITTLE_ENDIAN","correct":"from hachoir.stream import LITTLE_ENDIAN"}],"quickstart":{"code":"import io\nfrom hachoir.stream import StringInputStream, LITTLE_ENDIAN\nfrom hachoir.field import Root, UInt8, UInt16, Bytes\nfrom hachoir.parser import Parser\n\n# Define a simple custom parser for demonstration\nclass SimpleBinaryParser(Parser):\n    PARSER_TAGS = {\n        \"id\": \"simple_bin\",\n        \"category\": \"misc\",\n        \"description\": \"Simple binary format\"\n    }\n    endian = LITTLE_ENDIAN # Specify endianness for the parser\n\n    def createFields(self):\n        # A 1-byte header identifier\n        yield UInt8(self, \"header_byte\", \"Header identifier\")\n        # A 2-byte unsigned integer for length (little endian)\n        yield UInt16(self, \"length\", \"Length of data section\")\n        # A data payload whose size is determined by the 'length' field\n        yield Bytes(self, \"data\", self[\"length\"].value, \"Data payload\")\n\n# Create a dummy binary string:\n# - 0xAA (1 byte) for 'header_byte'\n# - 0x05 0x00 (2 bytes, little endian representation of 5) for 'length'\n# - \"hello\" (5 bytes) for 'data'\ndummy_data = b\"\\xAA\\x05\\x00hello\"\n\n# Create a StringInputStream from the dummy data\nstream = StringInputStream(dummy_data, \"simple_data_stream\")\n\n# Instantiate the parser with the stream\nparser = SimpleBinaryParser(stream)\n\n# Access and print the parsed field values\nprint(f\"Parsed Header Byte: {parser['header_byte'].value} (0x{parser['header_byte'].value:02X})\")\nprint(f\"Parsed Length: {parser['length'].value}\")\nprint(f\"Parsed Data: {parser['data'].value.decode('ascii')}\")","lang":"python","description":"This quickstart demonstrates how to define a simple custom parser, create an in-memory `StringInputStream` from binary data, and then parse and access fields using Hachoir. It illustrates the basic workflow of defining field types and accessing their values."},"warnings":[{"fix":"Migrate Python 2 code to Python 3. For imports, consolidate from individual sub-packages (e.g., `from hachoir_parser import createParser`) to the unified `hachoir` module (e.g., `from hachoir.parser import createParser`).","message":"Hachoir 3.x is a significant rewrite, dropping Python 2 support entirely. Code written for Hachoir 2.x (Python 2) will not run on Hachoir 3.x. Additionally, earlier Hachoir sub-packages (e.g., `hachoir-core`, `hachoir-parser`) were merged into a single `hachoir` module.","severity":"breaking","affected_versions":"3.0.0 and newer"},{"fix":"Ensure your environment uses Python 3.6 or a more recent version. Upgrade your Python interpreter if necessary.","message":"Hachoir 3.x requires Python 3.6 or newer. Running on older Python 3 versions (e.g., 3.4, 3.5) will result in errors.","severity":"breaking","affected_versions":"3.0a4 and newer"},{"fix":"Always remember that internal positions and sizes in Hachoir are measured in bits. Multiply byte counts by 8 when specifying sizes or offsets if a bit-level value is expected.","message":"Hachoir operates at the bit level, not byte level, for addresses and sizes. This means that functions expecting lengths often require values in bits (e.g., `8 * bytes`) rather than raw byte counts, which can lead to off-by-eight errors or unexpected behavior if not accounted for.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Be aware that accessing a field's `value` or `display` attribute triggers its parsing. If you need to ensure all relevant fields are parsed or available, explicitly iterate through them or access their properties.","message":"Hachoir uses lazy loading; fields are not fully parsed, and their values are not read until they are explicitly accessed (e.g., `field.value`). This design makes opening large files very fast but can be surprising for users expecting an immediate, complete parse tree.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[]}