pybcj

1.0.7 · active · verified Thu Apr 09

pybcj is a Python library that provides the BCJ (Branch/Call/Jump) filter, often used as a pre-processing step for data compression algorithms like LZMA (e.g., in xz). It aims to improve compression ratios for executable code by converting relative branch/call/jump targets to absolute addresses. The current version is 1.0.7, and releases are generally infrequent, driven by bug fixes or minor enhancements.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates two ways to use pybcj: with `BCJEncoder`/`BCJDecoder` objects utilizing `io.BytesIO` streams, and with the simpler top-level `compress_buffer`/`decompress_buffer` functions for direct byte array processing.

from pybcj import BCJEncoder, BCJDecoder, compress_buffer, decompress_buffer
import io

# Example data (simulating executable code, BCJ works best on actual binaries)
original_data = b"\xe8\x00\x00\x00\x00\x48\x83\xec\x28\xe9\x05\x00\x00\x00\x90\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
print(f"Original data length: {len(original_data)}")

# --- Method 1: Using BCJEncoder/Decoder objects with streams ---
# Encode
encoder = BCJEncoder()
encoded_stream = io.BytesIO()
encoder.compress(original_data, encoded_stream)
encoded_data_stream = encoded_stream.getvalue()
print(f"Encoded (stream) data length: {len(encoded_data_stream)}")

# Decode
decoder = BCJDecoder()
decoded_stream = io.BytesIO()
decoder.decompress(encoded_data_stream, decoded_stream)
decoded_data_stream = decoded_stream.getvalue()
print(f"Decoded (stream) data length: {len(decoded_data_stream)}")

assert original_data == decoded_data_stream
print("Stream method: Original and decoded data match!")

# --- Method 2: Using top-level compress_buffer/decompress_buffer functions ---
encoded_data_buffer = compress_buffer(original_data)
print(f"Encoded (buffer) data length: {len(encoded_data_buffer)}")

decoded_data_buffer = decompress_buffer(encoded_data_buffer)
print(f"Decoded (buffer) data length: {len(decoded_data_buffer)}")

assert original_data == decoded_data_buffer
print("Buffer method: Original and decoded data match!")

view raw JSON →