Reed-Solomon Encoder/Decoder
reedsolo is a pure-Python library providing a universal errors-and-erasures Reed-Solomon codec for data protection against errors and bitrot. It includes a fallback pure-Python implementation and an optional speed-optimized Cython/C extension. The library primarily focuses on burst-type errors, making it well-suited for data storage protection. The current stable version is 1.7.0, with a major 2.x branch actively in beta (2.1.1b1) introducing significant changes.
Warnings
- breaking The 2.x beta branch (e.g., v2.1.1b1) introduces significant breaking changes. It requires Cython >= 3.0.0b2 for its speed-optimized C extension and enforces stricter type usage, primarily expecting `bytearray` or `cpython array` objects for data, whereas v1.x was more flexible with list objects.
- gotcha By default, `reedsolo` (v1.x) installs only the pure-Python implementation. To utilize the faster Cython/C extension, you must explicitly request its compilation during installation, requiring Cython and a C++ compiler.
- gotcha The default Galois field GF(2^8) used by reedsolo means that the maximum total message length (data + ECC symbols) is 255 bytes. For messages longer than this, you must implement chunking to break the data into smaller, manageable blocks for encoding and decoding.
- gotcha If the number of errors or erasures exceeds the Reed-Solomon code's correction capability (Singleton bound), the `check()` method or even the decoder might return a mathematically valid but still incorrect/tampered message, rather than raising an error. This is a characteristic of Reed-Solomon codes.
Install
-
pip install reedsolo -
pip install reedsolo --install-option="--cythonize" --verbose -
pip install --config-setting="--build-option=--cythonize" reedsolo -
pip install reedsolo --pre
Imports
- RSCodec
from reedsolo import RSCodec
- ReedSolomonError
from reedsolo import ReedSolomonError
- RSCodec
from creedsolo import RSCodec
Quickstart
from reedsolo import RSCodec, ReedSolomonError
# Initialize RSCodec with the number of error correction (ECC) symbols
# 10 ECC symbols allow correction of up to 5 byte-level errors (nsym/2)
rsc = RSCodec(10)
original_message = b'hello world'
print(f"Original: {original_message}")
# Encode the message
encoded_message = rsc.encode(original_message)
print(f"Encoded: {encoded_message}")
# Simulate some errors (e.g., change 'o' to 'X' in 'world')
tampered_message = bytearray(encoded_message)
tampered_message[4] = ord(b'X') # 'o' in 'hello'
tampered_message[8] = ord(b'X') # 'o' in 'world'
tampered_message[12] = ord(b'X') # ECC part
print(f"Tampered: {tampered_message}")
# Decode and correct errors
try:
# decode returns (decoded_message, corrected_ecc_symbols, errata_positions)
decoded_message, _, _ = rsc.decode(tampered_message)
print(f"Decoded: {decoded_message}")
if original_message == decoded_message:
print("Decoding successful, message recovered.")
else:
print("Decoding failed or original message not fully recovered.")
except ReedSolomonError as e:
print(f"Decoding failed: {e}")