pathos
pathos is a framework for heterogeneous computing that provides tools for parallel graph management and execution. It offers a consistent high-level interface for configuring and launching parallel computations across diverse resources, aiming to extend user code to parallel and distributed computing with minimal refactoring. The library is currently at version 0.3.5 and has a consistent release cadence with minor versions released every few months, typically adding incremental features and dependency updates.
Warnings
- breaking The minimum required Python version for `pathos` has progressively increased. As of version 0.3.5, Python 3.9 or newer is required.
- gotcha When using `pathos.multiprocessing.ProcessingPool`, objects passed to worker processes are *copied* (serialized and deserialized) rather than shared in memory. This means modifications to these objects within a worker process will not reflect in the original object in the parent process or other workers.
- gotcha `pathos`'s `map` methods (e.g., `ProcessingPool.map`) directly accept multiple iterables as arguments for functions with multiple parameters, which differs from the standard `multiprocessing.Pool.map` signature that expects a single iterable of arguments. Users accustomed to `multiprocessing` might attempt workarounds like `itertools.starmap` or tuple unpacking in the target function, which are unnecessary and potentially less efficient with `pathos`.
- gotcha `pathos` leverages `dill` for object serialization, which is significantly more powerful than Python's default `pickle` used by standard `multiprocessing`. This allows `pathos` to reliably serialize and transfer complex objects, lambda functions, nested functions, and class methods to worker processes, which often cause `PicklingError` exceptions with plain `multiprocessing`.
Install
-
pip install pathos
Imports
- ProcessingPool
from pathos.multiprocessing import ProcessingPool as Pool
- ParallelPool
from pathos.pools import ParallelPool
Quickstart
from pathos.multiprocessing import ProcessingPool as Pool
def calculate_power(base, exponent):
return base ** exponent
if __name__ == '__main__':
bases = [1, 2, 3, 4, 5]
exponents = [2, 3, 2, 4, 3]
# Initialize a pool with a number of worker processes (e.g., 4)
pool = Pool(nodes=4)
# Use the map method to apply calculate_power in parallel
# pathos's map directly accepts multiple iterables for multiple arguments
results = pool.map(calculate_power, bases, exponents)
print(f"Bases: {bases}")
print(f"Exponents: {exponents}")
print(f"Parallel Results: {results}")
# Don't forget to close and join the pool when done
pool.close()
pool.join()