{"id":4126,"library":"mutf8","title":"MUTF-8 Encoder & Decoder","description":"This package provides fast pure-Python and optional C implementations for encoding and decoding MUTF-8 and CESU-8 character encodings. MUTF-8 is a variant of UTF-8 primarily encountered in Java Virtual Machine (JVM) contexts. It offers significant performance gains with its C extension, falling back to a pure-Python version if the C extension cannot be built. The current version is 1.0.6, released in late 2021, and the project is in a maintenance phase.","status":"maintenance","version":"1.0.6","language":"en","source_language":"en","source_url":"https://github.com/TkTech/mutf8","tags":["encoding","decoding","mutf8","cesu8","java","jvm"],"install":[{"cmd":"pip install mutf8","lang":"bash","label":"Install stable version"}],"dependencies":[],"imports":[{"symbol":"encode_modified_utf8","correct":"from mutf8 import encode_modified_utf8"},{"symbol":"decode_modified_utf8","correct":"from mutf8 import decode_modified_utf8"}],"quickstart":{"code":"from mutf8 import encode_modified_utf8, decode_modified_utf8\n\n# A string with a null character, which MUTF-8 handles differently\noriginal_string = \"Hello, \\u0000 World!\"\n\n# Encode the string to MUTF-8 bytes\nmutf8_bytes = encode_modified_utf8(original_string)\nprint(f\"Encoded MUTF-8 bytes: {mutf8_bytes!r}\")\n\n# Decode the MUTF-8 bytes back to a Python unicode string\ndecoded_string = decode_modified_utf8(mutf8_bytes)\nprint(f\"Decoded string: {decoded_string!r}\")\n\n# Example with a supplementary character (encoded as surrogate pairs in MUTF-8)\nsup_char_string = \"\\U0001F600\"\nmutf8_sup_char_bytes = encode_modified_utf8(sup_char_string)\nprint(f\"Encoded supplementary char: {mutf8_sup_char_bytes!r}\")\ndecoded_sup_char_string = decode_modified_utf8(mutf8_sup_char_bytes)\nprint(f\"Decoded supplementary char: {decoded_sup_char_string!r}\")","lang":"python","description":"This quickstart demonstrates how to encode a Python string (including one with a null character) into MUTF-8 bytes and then decode it back using the `mutf8` library. MUTF-8 handles null characters and supplementary characters differently than standard UTF-8."},"warnings":[{"fix":"Always use `mutf8.encode_modified_utf8` and `mutf8.decode_modified_utf8` when working with MUTF-8 encoded data, especially when interfacing with Java systems.","message":"MUTF-8 is a specific variant of UTF-8, primarily used in Java environments. It differs from standard UTF-8 in two key ways: the null character (U+0000) is encoded as a two-byte sequence (`0xC0 0x80` instead of `0x00`), and supplementary characters (code points above U+FFFF) are encoded as two three-byte sequences (via UTF-16 surrogate pairs) instead of a single four-byte sequence. Using Python's built-in `utf-8` codecs for MUTF-8 data will lead to incorrect results.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure a C99-compatible compiler is installed and available in your environment before installing `mutf8` to leverage the performance benefits of the C extension. Check installation logs for successful C extension compilation.","message":"The `mutf8` library provides a C extension for significant performance improvements (20x to 40x faster) over its pure-Python implementation. If a C99-compatible compiler is not available during installation, the library will silently fall back to the slower pure-Python version. This can lead to unexpected performance bottlenecks.","severity":"gotcha","affected_versions":"All versions with C extension"},{"fix":"Upgrade to `mutf8` version `1.0.3` or newer to benefit from improved error reporting and more accurate error locations in `UnicodeDecodeErrors`.","message":"Versions of `mutf8` prior to `1.0.3` provided less precise and less descriptive `UnicodeDecodeErrors`. This made debugging issues with malformed MUTF-8 input more challenging.","severity":"deprecated","affected_versions":"< 1.0.3"},{"fix":"Upgrade your Python environment to version 3.6 or newer to continue using `mutf8`.","message":"Support for Python 3.5 has been dropped in recent versions of `mutf8`. Attempting to install or use newer versions on Python 3.5 will likely fail.","severity":"breaking","affected_versions":"> unknown (post-Python 3.5 EOL)"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}