Swifter
Swifter is a Python package designed to significantly speed up `apply` operations on pandas DataFrames and Series. It achieves this by automatically determining the fastest available method for applying a function, leveraging vectorized pandas operations, Dask for parallel processing, or multi-threading/multi-processing. Swifter integrates directly into pandas objects, offering a seamless way to optimize user-defined functions without extensive code changes. The current version is 1.4.0, released in July 2023, and the project maintains an active development and release cadence.
Warnings
- gotcha Avoid using swifter with functions that modify external variables. Swifter performs 'sample applies' to optimize performance, which can lead to erroneous modifications of external variables in addition to the final apply operation.
- gotcha When `swifter` is called from a forked process, its progress bar may become confused. It is advisable to disable the progress bar in such scenarios.
- gotcha For compatibility with Modin DataFrames, `modin.pandas` must be imported *before* `swifter`, or `swifter.register_modin()` must be called explicitly after importing both.
- breaking Swifter relies on recent features of the pandas extension API. Older versions of pandas (e.g., pre-1.0) may not be fully compatible or may cause unexpected behavior.
- gotcha When using Dask as a backend for large datasets, `swifter` is limited to `axis=1` (row-wise application) for `df.swifter.apply()`. Attempting `axis=0` with large Dask-backed DataFrames may not use Dask or might result in errors.
Install
-
pip install swifter -
pip install -U pandas swifter[notebook] -
pip install -U swifter[groupby] -
conda install -c conda-forge swifter
Imports
- swifter
import swifter
- pandas
import pandas as pd
Quickstart
import pandas as pd
import swifter
# Create a sample DataFrame
df = pd.DataFrame({
'value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'category': ['A', 'B', 'A', 'C', 'B', 'A', 'C', 'B', 'A', 'C']
})
# Define a custom function
def complex_calculation(x):
import time
time.sleep(0.001) # Simulate a time-consuming operation
return x * x + 1
# Use swifter.apply() on a Series
df['squared_value'] = df['value'].swifter.apply(complex_calculation)
# Use swifter.apply() on a DataFrame (row-wise)
df['sum_squared'] = df.swifter.apply(lambda row: row['value']**2 + row['value'], axis=1)
print(df.head())