Sparse Bytes Virtual Memory Library
The `bytesparse` library provides utilities for managing sparse bytes within a virtual memory space. It offers an interface similar to Python's built-in `bytearray`, allowing for non-contiguous data allocation across a potentially infinite addressing space. Data chunks are stored internally using mutable `bytearray` objects. The library is currently at version 1.1.0 and exhibits an active release cadence, with several updates in the past year.
Common errors
-
TypeError: a bytes-like object is required, not 'str'
cause Attempting to pass a Python string (`str`) to `bytesparse` methods or constructors that specifically expect a bytes-like object (`bytes` or `bytearray`).fixEncode strings to bytes before passing them to `bytesparse` methods (e.g., `my_string.encode('utf-8')`) or use byte literals (e.g., `b'hello'`). -
Unexpected output from len() or bytes() for sparse data (e.g., len() is very large, bytes() contains many nulls).
cause `len(bytesparse_obj)` returns the total virtual address range covered by the object, not the size of only the physically stored data. `bytes(bytesparse_obj)` converts the *entire* virtual memory range (from the lowest to the highest address containing data) into a `bytes` object, filling unallocated regions with null bytes (`\x00`).fixTo retrieve only the physically stored data blocks, use `bytesparse_obj.to_blocks()`. To get the span of the *allocated* memory, use `bytesparse_obj.span()`. Understand that `len()` represents the virtual length across the entire addressable range, not the compact size. -
Performance degradation when performing many small, highly fragmented writes or deletions at distant addresses.
cause Frequent modifications at non-contiguous addresses can lead to an increase in the number of internal data blocks managed by `bytesparse`, which can add overhead to operations.fixFor better performance, try to coalesce writes into larger contiguous blocks when possible. If highly fragmented writes are unavoidable, periodically analyze the internal block structure (e.g., using `bytesparse_obj.to_blocks()`) to understand the overhead. The library is optimized for sparse data, but extreme fragmentation is a general performance consideration for such structures.
Warnings
- gotcha The companion `cbytesparse` (Cython) package has a limited addressing space (uint_fast64_t, typically 32-bit or 64-bit) and does not support infinite or negative addresses, unlike the pure Python `bytesparse` implementation. Users switching between implementations should be aware of this difference.
- gotcha While the Cython implementation (`cbytesparse`) aims for speedup, it is labeled as 'experimental' and the documentation suggests that even faster 'ad-hoc' implementations for specific hardware might exist. Do not assume `cbytesparse` provides optimal performance for all scenarios without benchmarking.
- gotcha Both `Memory` and `bytesparse` classes inherit from `collections.abc.MutableSequence` and `collections.abc.MutableMapping` to provide familiar interfaces. However, their internal sparse storage mechanism means that certain operations might have different performance characteristics or behaviors compared to direct Python `list` or `dict` equivalents, especially for highly fragmented data.
Install
-
pip install bytesparse
Imports
- Memory
from bytesparse import Memory
- bytesparse
from bytesparse import bytesparse
Quickstart
from bytesparse import bytesparse, Memory
# Create a bytesparse object from existing bytes
m = bytesparse(b'Hello, World!')
print(f"Initial bytesparse: {m}")
print(f"Length: {len(m)}")
print(f"As bytes: {bytes(m)}")
# Modify the content
m[0:5] = b'Ciao '
print(f"After modification: {bytes(m)}")
# Store data at a sparse, non-contiguous address
m.poke(1000, b'remote data')
print(f"After poking remote data: {m}")
print(f"Virtual length increased: {len(m)}")
# Accessing a generic Memory object
mem = Memory()
mem[0x100:0x105] = b'DATA_A'
mem[0x200:0x205] = b'DATA_B'
print(f"Memory object blocks: {mem.to_blocks()}")
print(f"Memory at 0x100: {mem.peek(0x100, 5)}")