Apache Airflow Papermill Provider

3.12.3 · active · verified Mon Apr 13

The Apache Airflow Papermill Provider integrates Papermill with Apache Airflow, enabling users to parameterize and execute Jupyter Notebooks as part of their Airflow DAGs. This allows for automated, reproducible, and scalable execution of notebooks within data pipelines. The current version is 3.12.3, and it follows the release cadence of Apache Airflow providers, with updates typically aligned with Airflow releases or independent fixes and features.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to define a simple DAG that uses the `PapermillOperator` to execute a Jupyter notebook. The notebook should have a cell tagged as 'parameters' to receive inputs. Remember to replace `/tmp/hello_world.ipynb` with the actual path to your notebook.

from __future__ import annotations

import pendulum

from airflow.models.dag import DAG
from airflow.providers.papermill.operators.papermill import PapermillOperator

# For a real-world scenario, ensure 'hello_world.ipynb' exists in your DAGs folder
# or a location accessible by Airflow, with a 'parameters' tagged cell.
# Example 'hello_world.ipynb':
# # In a cell, add tag 'parameters'
# msg = "Default message"
# print(f"Hello, {msg}!")

with DAG(
    dag_id="example_papermill_notebook",
    start_date=pendulum.datetime(2023, 1, 1, tz="UTC"),
    schedule=None,
    catchup=False,
    tags=["papermill", "example"],
) as dag:
    run_notebook = PapermillOperator(
        task_id="run_hello_world_notebook",
        input_nb="/tmp/hello_world.ipynb", # Replace with actual path or Airflow-accessible path
        output_nb="/tmp/out-{{ ds }}.ipynb",
        parameters={
            "msgs": "Ran from Airflow at {{ ds }}!"
        },
    )

view raw JSON →