Submitit Job Submission Library
Submitit is a Python 3.8+ toolbox developed by Facebook Incubator for submitting jobs to Slurm clusters, as well as providing a local executor for testing. It simplifies the process of dispatching Python functions to compute nodes, managing job states, and retrieving results. The current version is 1.5.4, and it sees active maintenance with occasional releases.
Warnings
- breaking Submitit 1.2.0 changed how job preemption vs. timeout is detected for Slurm jobs. This was in response to a regression in Slurm versions (between 19.04 and 20.02) and might alter the behavior or reporting for long-running or pre-empted jobs compared to older `submitit` versions.
- breaking Submitit 1.2.0 introduced fixes for quoting paths in various internal operations. If your code or Slurm configurations previously relied on specific (and possibly incorrect) path handling, this update might cause previously working but malformed paths to now fail explicitly or behave differently due to correct quoting.
- gotcha The `submitit.AutoExecutor` dynamically chooses between `SlurmExecutor` and `LocalExecutor` based on the environment (e.g., presence of `SLURM_JOB_ID` or Slurm executables). This can lead to unexpected local execution when a Slurm environment is assumed but not active, potentially consuming local resources or not fulfilling HPC requirements.
- gotcha Submitit extensively uses `cloudpickle` for serializing functions and their arguments across processes. Complex objects, lambda functions capturing intricate state, or non-picklable resources (e.g., open file handles, database connections) passed to submitted functions will often lead to serialization errors.
Install
-
pip install submitit
Imports
- AutoExecutor
from submitit import AutoExecutor
- SlurmExecutor
from submitit import SlurmExecutor
- LocalExecutor
from submitit import LocalExecutor
Quickstart
import submitit
import time
import os
def my_function(x):
time.sleep(0.1) # Simulate some work
print(f"Hello from job! Input: {x}, PID: {os.getpid()}")
return x * x
# Configure a log folder for submitit to store job information
# The %j placeholder will be replaced by the Slurm job ID
log_folder = os.path.join(os.getcwd(), "submitit_logs", "%j")
# Use AutoExecutor, which selects SlurmExecutor if a Slurm environment
# is detected, otherwise falls back to LocalExecutor.
executor = submitit.AutoExecutor(folder=log_folder)
# Set Slurm parameters (these are ignored by LocalExecutor)
executor.update_parameters(timeout_min=5, slurm_array_parallelism=2)
# Submit jobs within a batch context
with executor.batch():
jobs = []
for i in range(5):
job = executor.submit(my_function, i)
jobs.append(job)
print(f"Submitted {len(jobs)} jobs. Waiting for results...")
# Retrieve results (blocks until all jobs are complete)
outputs = [job.result() for job in jobs]
print(f"All jobs completed. Results: {outputs}")