ebcdic: Additional EBCDIC Codecs
The `ebcdic` package provides additional EBCDIC codecs for Python, primarily facilitating data exchange with legacy mainframe systems. EBCDIC (Extended Binary Coded Decimal Interchange Code) is a family of character encodings distinct from ASCII and Unicode. It is intended for use in scenarios where interoperability with EBCDIC-native systems is required. The current version is 2.0.1, with releases tied to Python version compatibility.
Warnings
- breaking Version 2.0.0 and later of the `ebcdic` package require Python 3.9 or newer. Older Python versions (2.7, 3.4-3.8) require `ebcdic` version 1.1.1, and even older versions (2.6, 3.1-3.3) require `ebcdic` version 1.0.0.
- gotcha Some EBCDIC codecs, such as 'cp037', 'cp273', 'cp500', and 'cp1140', may already be provided by Python's standard library and can 'overrule' or conflict with those provided by the `ebcdic` package. This might lead to unexpected behavior if specific mappings from the `ebcdic` package are expected.
- gotcha Using an incorrect EBCDIC code page (e.g., 'cp037' when the data is 'cp1047') will result in garbled or incorrect text during decoding. EBCDIC variants are not universally compatible.
- gotcha When performing file I/O with EBCDIC data, it is critical to explicitly specify the correct EBCDIC encoding (e.g., `encoding='cp500'`) in Python's `open()` function. Failing to do so will cause Python to attempt decoding with a default encoding (often UTF-8 or the system default), leading to `UnicodeDecodeError` or corrupted data.
Install
-
pip install ebcdic
Imports
- ebcdic
import ebcdic
Quickstart
import ebcdic
# Encode a Unicode string to EBCDIC (e.g., cp1141 for Germany/Austria)
unicode_string = 'hello world'
ebcdic_bytes = unicode_string.encode('cp1141')
print(f"Encoded EBCDIC bytes: {ebcdic_bytes}")
# Decode EBCDIC bytes back to a Unicode string
decoded_string = ebcdic_bytes.decode('cp1141')
print(f"Decoded Unicode string: {decoded_string}")
# Example with a different codec (cp1047 for Open Systems)
# Note: Python's standard library may already include some common EBCDIC codecs.
sample_bytes_cp1047 = b'\x88\x85\x93\x93\x96@\xa6\x96\x99\x93\x84'
decoded_cp1047 = sample_bytes_cp1047.decode('cp1047')
print(f"Decoded with cp1047: {decoded_cp1047}")