Morphys (Python library)
Morphys is a Python library (version 1.0) designed for smart and common conversions between Unicode (Python `str`) and bytes types. It aims to simplify handling character encodings, offering a consistent interface for these essential text processing operations. While available on PyPI, specific details about its ongoing development or release cadence are not publicly detailed beyond its initial release in 2018.
Common errors
-
UnicodeEncodeError: 'charmap' codec can't encode character...
cause Attempting to encode a Unicode string containing characters not representable in the target (often default system) encoding, or writing non-ASCII characters to a file opened without specifying a UTF-8 encoding.fixExplicitly specify the encoding, typically `utf-8`, when converting a string to bytes or writing to a file. For example: `to_bytes(my_string, encoding='utf-8')` or `open('file.txt', 'w', encoding='utf-8')`. -
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x... in position ...: invalid start byte
cause Attempting to decode a byte sequence using an incorrect encoding. For instance, trying to decode Latin-1 encoded bytes with a UTF-8 decoder.fixDetermine the correct encoding of the incoming bytes and use it for decoding. If the encoding is unknown or mixed, consider using `errors='replace'` or `errors='ignore'` for lenient decoding, or a library like `chardet` for detection. Example: `to_unicode(my_bytes, encoding='latin-1')` or `to_unicode(my_bytes, encoding='utf-8', errors='replace')`. -
TypeError: a bytes-like object is required, not 'str'
cause Passing a Python `str` object to a function or operation that explicitly expects a `bytes` object (or vice-versa), without performing the necessary conversion.fixConvert the `str` to `bytes` using `to_bytes()` before passing it, or convert `bytes` to `str` using `to_unicode()` if the function expects a string. Example: `func_expecting_bytes(to_bytes(my_string))` or `func_expecting_string(to_unicode(my_bytes))`.
Warnings
- gotcha The `morphys` library (version 1.0) appears to be minimally maintained since its 2018 release. While functional, it might not receive updates for newer Python versions or complex encoding scenarios.
- gotcha Misunderstanding string vs. bytes in Python is a common source of `TypeError` or `UnicodeDecodeError`/`UnicodeEncodeError`. Always be explicit about the type you're working with and the expected encoding.
Install
-
pip install morphys
Imports
- to_unicode
from morphys import to_unicode
- to_bytes
from morphys import to_bytes
Quickstart
from morphys import to_unicode, to_bytes
# Example 1: Convert a Unicode string to bytes
unicode_string = "Hello, world! 👋"
encoded_bytes = to_bytes(unicode_string, encoding="utf-8")
print(f"Unicode string: {unicode_string}")
print(f"Encoded bytes: {encoded_bytes}")
# Example 2: Convert bytes back to a Unicode string
decoded_string = to_unicode(encoded_bytes, encoding="utf-8")
print(f"Decoded string: {decoded_string}")
# Example 3: Handling potential non-UTF8 characters (assuming default error handling)
# If 'morphys' has smart handling, it might try other encodings or use 'replace'/'ignore'
# This example still explicitly uses UTF-8 and shows a common issue if characters are not valid
try:
latin1_bytes = b'caf\xe9'
print(f"\nAttempting to decode latin1 bytes: {latin1_bytes}")
decoded_from_latin1 = to_unicode(latin1_bytes, encoding="utf-8", errors="strict")
print(f"Decoded with strict UTF-8: {decoded_from_latin1}")
except UnicodeDecodeError as e:
print(f"Error decoding with strict UTF-8 (expected): {e}")
# Using 'replace' error handling for robustness
decoded_from_latin1_safe = to_unicode(latin1_bytes, encoding="utf-8", errors="replace")
print(f"Decoded with UTF-8 (replace errors): {decoded_from_latin1_safe}")