Morphys (Python library)

1.0 · active · verified Thu Apr 16

Morphys is a Python library (version 1.0) designed for smart and common conversions between Unicode (Python `str`) and bytes types. It aims to simplify handling character encodings, offering a consistent interface for these essential text processing operations. While available on PyPI, specific details about its ongoing development or release cadence are not publicly detailed beyond its initial release in 2018.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates the core functionality of `morphys`: converting between Python's Unicode `str` type and `bytes`. It shows encoding a string to bytes and then decoding the bytes back to a string, using the common `utf-8` encoding. It also includes an example of handling bytes that might not conform to the expected encoding, illustrating common error handling strategies (assuming `morphys` wraps standard Python encoding/decoding with similar `errors` parameters).

from morphys import to_unicode, to_bytes

# Example 1: Convert a Unicode string to bytes
unicode_string = "Hello, world! 👋"
encoded_bytes = to_bytes(unicode_string, encoding="utf-8")
print(f"Unicode string: {unicode_string}")
print(f"Encoded bytes: {encoded_bytes}")

# Example 2: Convert bytes back to a Unicode string
decoded_string = to_unicode(encoded_bytes, encoding="utf-8")
print(f"Decoded string: {decoded_string}")

# Example 3: Handling potential non-UTF8 characters (assuming default error handling)
# If 'morphys' has smart handling, it might try other encodings or use 'replace'/'ignore'
# This example still explicitly uses UTF-8 and shows a common issue if characters are not valid
try:
    latin1_bytes = b'caf\xe9'
    print(f"\nAttempting to decode latin1 bytes: {latin1_bytes}")
    decoded_from_latin1 = to_unicode(latin1_bytes, encoding="utf-8", errors="strict")
    print(f"Decoded with strict UTF-8: {decoded_from_latin1}")
except UnicodeDecodeError as e:
    print(f"Error decoding with strict UTF-8 (expected): {e}")
    # Using 'replace' error handling for robustness
    decoded_from_latin1_safe = to_unicode(latin1_bytes, encoding="utf-8", errors="replace")
    print(f"Decoded with UTF-8 (replace errors): {decoded_from_latin1_safe}")

view raw JSON →