Partd
Partd is a Python library that provides appendable key-value storage, primarily for raw bytes. It excels at shuffling operations, allowing efficient appending of data to existing values associated with a key. The current version is 1.4.2, and it appears to have a stable, though not rapid, release cadence, with the latest update in May 2024.
Warnings
- gotcha Partd's core functionality stores raw bytes. When working with Python objects (like lists, dictionaries, or custom classes), you must explicitly use an encoding layer (e.g., `partd.python.Python` for `pickle`/`msgpack` or `partd.numpy.Numpy` for arrays) or handle serialization yourself. Direct `append` expects `bytes`.
- gotcha For many small write operations, especially in parallel environments, the default file-based Partd implementations (`partd.file.File`) can be inefficient due to I/O overhead and locking. The documentation suggests that 'this is hard to do in parallel while also maintaining consistency' and recommends a centralized server solution for caching.
- gotcha For file-backed Partd instances (`partd.file.File`), it's crucial to explicitly call `.drop()` to clean up the created directories and files when they are no longer needed. Failing to do so can leave orphaned data on the filesystem.
Install
-
pip install partd
Imports
- File
from partd.file import File
- Buffer
from partd.buffer import Buffer
- Python
from partd.python import Python
- Numpy
from partd.numpy import Numpy
Quickstart
import partd
import numpy as np
# Create a Partd backed by a directory (or in-memory buffer)
p = partd.File('my_partd_data') # or p = partd.Buffer()
# Append key-byte pairs
p.append({'x': b'Hello '})
p.append({'x': b'world!'})
p.append({'y': b'123'})
p.append({'y': b'456'})
# Get bytes associated to keys
print(f"Value for 'x': {p.get('x')}")
print(f"Value for 'y' and 'x': {p.get(['y', 'x'])}")
# Example with NumPy encoding
# Requires 'numpy' as an optional dependency
p_np = partd.numpy.Numpy(partd.File('my_numpy_data'))
p_np.append({'data': np.array([1, 2, 3])})
p_np.append({'data': np.array([4, 5, 6])})
print(f"NumPy array: {p_np.get('data')}")
# Clean up (for File-backed Partd)
p.drop()
p_np.drop()