Web Encodings
webencodings is a Python library that implements the WHATWG Encoding Standard. It provides character encoding aliases and rules for handling legacy web content, such as US-ASCII and ISO-8859-1 mapping to Windows-1252, and byte order mark (BOM) detection. The current version is 0.5.1, and its release cadence is considered stalled.
Warnings
- gotcha The default error handling for `webencodings` decoding is 'replace', which replaces invalid bytes with the replacement character (U+FFFD). This differs from Python's standard library `codecs` module, which defaults to 'strict' and raises a `UnicodeDecodeError`.
- gotcha webencodings primarily focuses on mapping web-specific encoding labels and BOM detection, while the actual encoding/decoding implementation relies on Python's standard `codecs` module. Users should be aware that it's an alias and detection layer rather than a full independent encoding engine.
- deprecated The library's development status on PyPI is '4 - Beta' and its release cadence is 'Stalled', with the last release in April 2017. While widely used, it indicates a lack of active development and may not receive updates for new encoding standards or Python versions.
Install
-
pip install webencodings
Imports
- lookup
from webencodings import lookup
- decode
from webencodings.sync import decode
- encode
from webencodings.sync import encode
Quickstart
from webencodings import lookup
# Look up an encoding by its label
utf8_encoding = lookup('utf-8')
if utf8_encoding:
# Encode a string
text_to_encode = "Hello, world!"
encoded_bytes = utf8_encoding.encode(text_to_encode)
print(f"Encoded bytes: {encoded_bytes}")
# Decode bytes (with default error handling 'replace')
bytes_to_decode = b'Hello, world!\xed'
decoded_text = utf8_encoding.decode(bytes_to_decode)
print(f"Decoded text (with replace): {decoded_text}")
# Decode bytes with strict error handling
try:
decoded_text_strict = utf8_encoding.decode(bytes_to_decode, errors='strict')
print(f"Decoded text (strict): {decoded_text_strict}")
except UnicodeDecodeError as e:
print(f"Strict decoding error: {e}")
else:
print("UTF-8 encoding not found.")