msgpack-numpy-opentensor
msgpack-numpy-opentensor provides efficient serialization and deserialization routines for NumPy array and scalar data types using the MessagePack binary format. It is functionally derived from the `msgpack-numpy` library, offering compatibility with NumPy data structures. The current PyPI version is `0.5.0`. While its related GitHub repository shows more recent development, PyPI releases are infrequent, with the latest over a year old.
Common errors
-
TypeError: buffer is too small for requested array
cause This error often occurs when deserializing multi-dimensional NumPy arrays, potentially due to an incompatibility between `msgpack-numpy` (or `msgpack-numpy-opentensor`), `msgpack`, and `numpy` versions, or an internal buffer handling issue during array reconstruction.fixEnsure all related packages (`msgpack-numpy-opentensor`, `msgpack`, `numpy`) are updated to their latest compatible versions. If the problem persists, review the specific `numpy` version requirements for `msgpack-numpy` or consider reporting the issue with full version details. -
KeyError: b'nd' (or similar during unpackb with m.patch())
cause When `m.patch()` is used, the global `object_hook` for `msgpack` is set. If `msgpack.unpackb` then encounters a dictionary that is *not* a serialized NumPy array (e.g., a regular Python dictionary), the `msgpack_numpy_opentensor` decoder might incorrectly attempt to interpret it as an array, leading to a `KeyError` because expected NumPy metadata keys (like `b'nd'`) are missing.fixAvoid using `m.patch()` for general `msgpack` usage. Instead, explicitly pass `default=m.encode` to `msgpack.packb` and `object_hook=m.decode` to `msgpack.unpackb` only when you expect NumPy arrays to be present in the data being serialized/deserialized. This provides more granular control and prevents unintended decoding attempts. -
TypeError: must be str, not bytes (or similar for datetime objects in arrays)
cause `msgpack-numpy-opentensor` primarily focuses on numerical data types. Serializing NumPy arrays containing complex Python objects like `datetime` objects, especially if they have `dtype='O'` (object), can lead to type errors as `msgpack-numpy`'s default handlers may not know how to convert these specific objects.fixConvert `numpy.datetime64` arrays or arrays of Python `datetime` objects to a simpler, serializable format (e.g., integer timestamps, ISO 8601 strings) before packing. Alternatively, implement custom `default` encoder and `object_hook` decoder functions that specifically handle `datetime` objects within your NumPy array serialization pipeline.
Warnings
- breaking The upstream `opentensor/msgpack-numpy` GitHub repository, linked as this package's source, released `v1.0.0` with a breaking change: it disables `pickle` by default. This will prevent deserialization of NumPy arrays with `dtype='O'` (object arrays) that were serialized with pickle enabled in older versions. While `msgpack-numpy-opentensor` on PyPI is currently `0.5.0`, this change may propagate to future versions.
- gotcha NumPy arrays with `dtype='O'` (object arrays) are serialized/deserialized using Python's `pickle` module as a fallback by `msgpack-numpy` (and, by extension, likely `msgpack-numpy-opentensor`). This introduces significant performance overhead and poses security risks when deserializing data from untrusted sources due to pickle's arbitrary code execution capabilities.
- gotcha NumPy arrays deserialized by `msgpack-numpy` (and thus, `msgpack-numpy-opentensor`) are read-only by default. Attempting to modify them directly will raise a `ValueError` or `AttributeError`.
- gotcha The underlying `msgpack` library has limitations on the maximum size of individual binary or string objects, typically around 4.3 GB. Attempting to serialize a single NumPy array that exceeds this limit may result in serialization errors.
Install
-
pip install msgpack-numpy-opentensor
Imports
- patch
from msgpack_numpy_opentensor import patch
import msgpack import msgpack_numpy_opentensor as m m.patch()
- encode
from msgpack_numpy_opentensor import encode
import msgpack_numpy_opentensor as m msgpack.packb(data, default=m.encode)
- decode
from msgpack_numpy_opentensor import decode
import msgpack_numpy_opentensor as m msgpack.unpackb(packed_data, object_hook=m.decode)
Quickstart
import numpy as np
import msgpack
import msgpack_numpy_opentensor as m
# Create a NumPy array
x = np.random.rand(5, 5)
# Pack the NumPy array using msgpack-numpy-opentensor's encoder
# Optionally, you can call m.patch() to monkey-patch msgpack globally
# m.patch()
packed_x = msgpack.packb(x, default=m.encode)
# Unpack the bytes back into a NumPy array using the decoder
unpacked_x = msgpack.unpackb(packed_x, object_hook=m.decode, raw=False)
print("Original array:\n", x)
print("Unpacked array:\n", unpacked_x)
print("Arrays are equal:", np.array_equal(x, unpacked_x))
print("Unpacked array is read-only:", not unpacked_x.flags['WRITEABLE'])