jobflow
jobflow is a free, open-source Python library (v0.3.1) for writing and executing computational workflows. It enables defining complex workflows using simple Python functions and executing them locally or on remote resources via managers like `jobflow-remote` or `FireWorks`. Key features include dynamic workflows, easy compositing and nesting of workflows, and the ability to store workflow outputs across various databases (MongoDB, S3, GridFS, etc.) through the `Maggma` package. The library is actively maintained with regular updates.
Common errors
-
ModuleNotFoundError: No module named 'jobflow.managers.fireworks'
cause Attempting to use the FireWorks manager without installing the optional 'fireworks' dependencies.fixInstall jobflow with the FireWorks extra: `pip install jobflow[fireworks]`. -
TypeError: 'OutputReference' object is not callable
cause Incorrectly attempting to call or directly access the value of a `job.output` object before the job has been executed and its output resolved.fixRemember that `job.output` returns an `OutputReference` for dependency tracking. Access the actual result from the `JobResponse` after running the flow, e.g., `responses[my_job.uuid].output`. -
jobflow.settings.JobflowSettingsValidationError: The job store must be configured.
cause The `JOB_STORE` setting is not properly defined in the `~/.jobflow.yaml` configuration file, which is required for any persistent or remote workflow execution.fixCreate or update `~/.jobflow.yaml` in your home directory with a valid `JOB_STORE` configuration, for example: ```yaml JOB_STORE: _fw_name: "JSONStore" path: "./my_job_data" ``` or a `MongoStore` configuration if `Maggma` is installed. -
AttributeError: 'Flow' object has no attribute 'draw_graph'
cause Attempting to call `flow.draw_graph()` without having installed the optional visualization dependencies.fixInstall the visualization dependencies: `pip install jobflow[vis]`.
Warnings
- breaking As of `v0.1.3`, jobflow migrated its settings handling to `Pydantic`. The default `JOB_STORE` and other configurations are now managed through a `~/.jobflow.yaml` file. Existing setups using older configuration methods may break.
- gotcha Job outputs accessed via `.output` are `OutputReference` objects, not the direct computed values. These references are used by jobflow to build the workflow graph and automatically resolve dependencies during execution. Attempting to use `.output` as an immediate value before the job has run will result in errors.
- gotcha The `draw_graph()` method for visualizing flows requires the `graphviz` package and the `jobflow[vis]` extra to be installed. Without it, attempting to draw a graph will likely result in an `AttributeError` or `ModuleNotFoundError`.
- breaking While `jobflow` itself supports Python >=3.10, its companion package `jobflow-remote` (used for remote execution) dropped support for Python 3.9 as of its `v1.0` release. There were also warnings regarding Python 3.14.1 due to `networkx` issues.
- gotcha Misconfiguration of the `JobStore` (the database where job inputs/outputs are stored) can lead to jobs not being persisted, not being found by managers, or data loss. This is especially critical for remote execution or long-running workflows.
Install
-
pip install jobflow -
pip install jobflow[fireworks] -
pip install jobflow[vis]
Imports
- job
from jobflow import job
- Flow
from jobflow import Flow
- run_locally
from jobflow.managers.local import run_locally
- SETTINGS
from jobflow.settings import SETTINGS_CLASS_NAME
from jobflow import SETTINGS
Quickstart
from jobflow import job, Flow
from jobflow.managers.local import run_locally
@job
def add(a, b):
return a + b
@job
def multiply(a, b):
return a * b
# Create Job objects
job1 = add(1, 2)
job2 = multiply(job1.output, 3)
job3 = add(job2.output, 10)
# Create a Flow from the jobs
flow = Flow([job1, job2, job3], name="my_first_flow")
# Run the Flow locally
# For persistent storage, configure ~/.jobflow.yaml with a JobStore
# e.g., JOB_STORE: { _fw_name: "JSONStore", path: "./jobstore" }
responses = run_locally(flow)
# Access results
final_result = responses[job3.uuid].output
print(f"Final result: {final_result}")