py-multicodec
Multicodec is a self-describing multiformat that wraps other formats with a tiny bit of self-description. A multicodec identifier is a varint and the code identifying the following data. This Python implementation provides functions for adding and removing prefixes, and for managing type-safe codec handling using `Code` objects. It is currently at version 1.0.0, released after a significant development period.
Common errors
-
TypeError: a bytes-like object is required, not 'str'
cause Attempting to pass a Python `str` object directly to `add_prefix` or other functions expecting raw bytes.fixAlways convert string data to bytes using `.encode('utf-8')` (or another appropriate encoding) before passing it to `py-multicodec` functions, e.g., `add_prefix('sha2-256', 'my string'.encode('utf-8'))`. -
NameError: name 'multicodec' is not defined
cause Trying to call functions like `multicodec.add_prefix` without explicitly importing the functions or the `multicodec` module itself.fixUse direct imports for the functions you need, e.g., `from multicodec import add_prefix`, or import the module as `import multicodec` and then call `multicodec.add_prefix`. -
ValueError: Found code 0x00 when unwrapping data, expected code 0x12
cause This error occurs when using a specific `Code` object's `unwrap` method on data that was prefixed with a *different* codec (e.g., trying to unwrap 'identity' data with a 'sha2-256' Code object).fixEnsure the `Code` object used for unwrapping (`my_code.unwrap()`) corresponds to the actual codec of the data. If you only want to remove the prefix without strict validation, use `multicodec.remove_prefix(prefixed_data)` instead.
Warnings
- gotcha The library `py-multicodec` is distinct from `multicode` (a Unicode handling library) and is also a sub-module of the broader `multiformats` library. Ensure you install and import from `py-multicodec` for standalone multicodec functionality.
- breaking While no specific 0.x to 1.x breaking changes are explicitly documented in the README, the jump to 1.0.0 implies a stable API. Users upgrading from pre-1.0.0 versions should review the changelog on GitHub for potential API alterations.
- gotcha When unwrapping data using an explicit `Code` object (e.g., `my_code.unwrap(prefixed_data)`), the library strictly enforces that the data's internal codec matches `my_code`. If they differ, a `ValueError` is raised.
- gotcha The library's internal codec lookup table is based on the canonical multicodec table. If this table is updated upstream, your local `py-multicodec` installation might become outdated and not recognize new codecs. The project provides a tool to update this table.
Install
-
pip install py-multicodec
Imports
- add_prefix
from multicodec import add_prefix
- remove_prefix
from multicodec import remove_prefix
- get_codec
from multicodec import get_codec
- Code
from multicodec import Code
- known_codes
from multicodec import known_codes
- SHA2_256
from multicodec.code_table import SHA2_256, DAG_CBOR
Quickstart
from multicodec import add_prefix, remove_prefix, get_codec, Code, known_codes
from multicodec.code_table import SHA2_256
# Basic prefix operations
prefixed_data = add_prefix('sha2-256', b'Some raw data')
print(f"Prefixed data: {prefixed_data.hex()}")
raw_data = remove_prefix(prefixed_data)
print(f"Raw data: {raw_data}")
codec_name = get_codec(prefixed_data)
print(f"Codec name: {codec_name}")
# Type-safe Code management
sha2_256_code = SHA2_256
print(f"SHA2-256 Code: {sha2_256_code} (int: {int(sha2_256_code)}, str: {str(sha2_256_code)})")
code_from_string = Code.from_string("sha2-256")
print(f"Code from string: {code_from_string}")
all_known_codes = known_codes()
print(f"Number of known codes: {len(all_known_codes)}")