pybcj
pybcj is a Python library that provides the BCJ (Branch/Call/Jump) filter, often used as a pre-processing step for data compression algorithms like LZMA (e.g., in xz). It aims to improve compression ratios for executable code by converting relative branch/call/jump targets to absolute addresses. The current version is 1.0.7, and releases are generally infrequent, driven by bug fixes or minor enhancements.
Warnings
- gotcha pybcj requires Python 3.10 or newer. Installing on older Python versions will result in a `Requires-Python` error.
- gotcha The `BCJEncoder.compress()` and `BCJDecoder.decompress()` methods expect stream-like objects (e.g., `io.BytesIO`) as input/output targets. For direct byte array to byte array operations, use the top-level `compress_buffer()` and `decompress_buffer()` functions.
- deprecated Versions prior to `1.0.0` (specifically `0.x` releases) might have had different API structures or missing helper functions like `compress_buffer`/`decompress_buffer`. These older versions are no longer actively maintained.
Install
-
pip install pybcj
Imports
- BCJEncoder
from pybcj import BCJEncoder
- BCJDecoder
from pybcj import BCJDecoder
- compress_buffer
from pybcj import compress_buffer
Quickstart
from pybcj import BCJEncoder, BCJDecoder, compress_buffer, decompress_buffer
import io
# Example data (simulating executable code, BCJ works best on actual binaries)
original_data = b"\xe8\x00\x00\x00\x00\x48\x83\xec\x28\xe9\x05\x00\x00\x00\x90\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
print(f"Original data length: {len(original_data)}")
# --- Method 1: Using BCJEncoder/Decoder objects with streams ---
# Encode
encoder = BCJEncoder()
encoded_stream = io.BytesIO()
encoder.compress(original_data, encoded_stream)
encoded_data_stream = encoded_stream.getvalue()
print(f"Encoded (stream) data length: {len(encoded_data_stream)}")
# Decode
decoder = BCJDecoder()
decoded_stream = io.BytesIO()
decoder.decompress(encoded_data_stream, decoded_stream)
decoded_data_stream = decoded_stream.getvalue()
print(f"Decoded (stream) data length: {len(decoded_data_stream)}")
assert original_data == decoded_data_stream
print("Stream method: Original and decoded data match!")
# --- Method 2: Using top-level compress_buffer/decompress_buffer functions ---
encoded_data_buffer = compress_buffer(original_data)
print(f"Encoded (buffer) data length: {len(encoded_data_buffer)}")
decoded_data_buffer = decompress_buffer(encoded_data_buffer)
print(f"Decoded (buffer) data length: {len(decoded_data_buffer)}")
assert original_data == decoded_data_buffer
print("Buffer method: Original and decoded data match!")