dagstermill

raw JSON →
0.29.2 verified Mon Apr 27 auth: no python

dagstermill integrates Jupyter notebooks into Dagster pipelines, allowing notebooks to be executed as solid computations with input/output dependencies. Current version 0.29.2, requires Python 3.10-3.14. Release cadence matches dagster core (monthly).

pip install dagstermill
error ModuleNotFoundError: No module named 'dagstermill'
cause dagstermill not installed or installed in wrong environment.
fix
Run 'pip install dagstermill' in the same Python environment where Dagster runs.
error KeyError: 'DAGSTER_HOME'
cause Dagster requires DAGSTER_HOME environment variable set for persistent run storage.
fix
Set DAGSTER_HOME (e.g., export DAGSTER_HOME=/path/to/dagster_home) or run without persistent storage.
error papermill.exceptions.PapermillMissingParameterException: Notebook does not have a cell with tag 'parameters'
cause Input notebook lacks a tagged parameters cell.
fix
Add a cell with parameter defaults and tag it as 'parameters' in Jupyter Notebook.
error TypeError: define_dagstermill_solid() got an unexpected keyword argument 'output_notebook_name'
cause Using older version of dagstermill (pre-0.14) where the function signature differs.
fix
Upgrade dagstermill: pip install --upgrade dagstermill. Or use define_dagstermill_op.
breaking In dagster 1.0+, solids are renamed to ops. Use define_dagstermill_op instead of define_dagstermill_solid.
fix Replace define_dagstermill_solid with define_dagstermill_op and use @op/@job decorators.
gotcha Notebook must have a 'parameters' cell (tagged) to accept inputs. Without it, inputs are silently ignored.
fix In Jupyter, add a cell with default values and tag it as 'parameters' (use Cell Toolbar > Tags).
deprecated Managed notebook execution (Engine/Resource) deprecated in favor of simple context.
fix Remove execution_engine arguments; use default execution.
gotcha Output notebooks are stored in the run's output directory, not necessarily local. Use io_manager to persist.
fix Configure a filesystem or S3 io_manager to capture output notebooks.
deprecated The 'output_notebook' materialization is deprecated; use Dagster's dynamic output instead.
fix Use the 'output_notebook_name' parameter in define_dagstermill_op and handle via op outputs.
pip install dagstermill[pandas]

Define a Dagster op that runs a Jupyter notebook. Note: define_dagstermill_op replaces the deprecated define_dagstermill_solid.

from dagster import job, op
from dagstermill import define_dagstermill_op

my_notebook_op = define_dagstermill_op(
    name='my_notebook_op',
    notebook_path='notebooks/my_notebook.ipynb',
    output_notebook_name='output.ipynb'
)

@job
def my_job():
    my_notebook_op()

if __name__ == '__main__':
    result = my_job.execute_in_process()
    print(result.success)