Web Encodings

0.5.1 · maintenance · verified Sat Mar 28

webencodings is a Python library that implements the WHATWG Encoding Standard. It provides character encoding aliases and rules for handling legacy web content, such as US-ASCII and ISO-8859-1 mapping to Windows-1252, and byte order mark (BOM) detection. The current version is 0.5.1, and its release cadence is considered stalled.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `webencodings.lookup` to retrieve an `Encoding` object, and then use its `encode` and `decode` methods. It also highlights the default 'replace' error handling for decoding and how to explicitly use 'strict' handling.

from webencodings import lookup

# Look up an encoding by its label
utf8_encoding = lookup('utf-8')

if utf8_encoding:
    # Encode a string
    text_to_encode = "Hello, world!"
    encoded_bytes = utf8_encoding.encode(text_to_encode)
    print(f"Encoded bytes: {encoded_bytes}")

    # Decode bytes (with default error handling 'replace')
    bytes_to_decode = b'Hello, world!\xed'
    decoded_text = utf8_encoding.decode(bytes_to_decode)
    print(f"Decoded text (with replace): {decoded_text}")

    # Decode bytes with strict error handling
    try:
        decoded_text_strict = utf8_encoding.decode(bytes_to_decode, errors='strict')
        print(f"Decoded text (strict): {decoded_text_strict}")
    except UnicodeDecodeError as e:
        print(f"Strict decoding error: {e}")
else:
    print("UTF-8 encoding not found.")

view raw JSON →