mapply

raw JSON →
0.2.0 verified Mon Apr 27 auth: no python

Sensible multi-core apply function for Pandas. Currently version 0.2.0, supporting Pandas v3 (last version supporting Pandas v2 is 0.1.31). Active development with frequent releases.

pip install mapply
error AttributeError: 'DataFrame' object has no attribute 'mapply'
cause mapply not imported or imported after DataFrame already created.
fix
Add 'import mapply' at the top of your script, before any pandas DataFrame creation.
error ModuleNotFoundError: No module named 'mapply'
cause mapply is not installed.
fix
Run: pip install mapply
error ValueError: n_workers must be >= 1 or -1 for all CPUs
cause Passing invalid value to mapply.init(n_workers=...).
fix
Use n_workers=-1 for all CPUs, or an integer >= 1.
breaking mapply 0.2.0 drops support for Pandas v2. Pandas v3+ is required. Last compatible version is 0.1.31.
fix Pin mapply<0.2.0 if using Pandas v2, or upgrade to Pandas v3.
deprecated mapply 0.1.24+ declares pandas as a dependency. Older versions lack this, causing resolvers to pair with incompatible pandas versions.
fix Always pin mapply>=0.1.24.
gotcha mapply must be imported before any pandas operations. Importing after creating DataFrames will not apply the monkey-patch.
fix Always import mapply at the top of your script, before any pandas calls.
gotcha By default mapply uses multiprocessing (Pool). For multithreading, set POOL_CLASS to ThreadPool via mapply.init(pool_class='ThreadPool').
fix Call mapply.init(n_workers=..., pool_class='ThreadPool') for I/O-bound tasks.

Initialize mapply with all CPUs, then use .mapply() like .apply() but parallel.

import pandas as pd
import mapply

mapply.init(n_workers=-1)  # use all CPUs

df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
result = df.mapply(lambda row: row['a'] + row['b'], axis=1)
print(result)