Multi-byte String Decoder

1.1.4 · active · verified Thu Apr 09

mbstrdecoder is a Python library designed for robust decoding of multi-byte character strings, particularly useful when dealing with unknown or potentially malformed encodings. It aims to prevent `UnicodeDecodeError` exceptions by attempting to decode using various strategies, often leveraging the `chardet` library for encoding detection. The current version is 1.1.4, and it generally follows a minor release cadence driven by Python version support and bug fixes.

Warnings

Install

Imports

Quickstart

Initialize `MbStrDecoder` with a byte string. You can optionally provide an initial `encoding` hint or let it auto-detect. The `unicode_str` attribute holds the decoded string, and `detected_encoding` shows the encoding used.

from mbstrdecoder import MbStrDecoder

# Example 1: Decode a byte string with known encoding
decoder1 = MbStrDecoder(b"hello\xc2\xa3world", encoding="utf-8")
print(f"Decoded (UTF-8 known): {decoder1.unicode_str}, Encoding: {decoder1.detected_encoding}")

# Example 2: Decode a byte string with unknown encoding (chardet will detect)
decoder2 = MbStrDecoder(b"\xa3123.45") # Assuming some non-UTF8 locale, chardet will try
print(f"Decoded (auto-detect): {decoder2.unicode_str}, Encoding: {decoder2.detected_encoding}")

# Example 3: Handling undecodable bytes gracefully (if any)
decoder3 = MbStrDecoder(b'\xed\xa0\x80some invalid bytes', errors='replace')
print(f"Decoded (replace errors): {decoder3.unicode_str}, Encoding: {decoder3.detected_encoding}")

view raw JSON →