{"id":5111,"library":"apache-airflow-providers-papermill","title":"Apache Airflow Papermill Provider","description":"The Apache Airflow Papermill Provider integrates Papermill with Apache Airflow, enabling users to parameterize and execute Jupyter Notebooks as part of their Airflow DAGs. This allows for automated, reproducible, and scalable execution of notebooks within data pipelines. The current version is 3.12.3, and it follows the release cadence of Apache Airflow providers, with updates typically aligned with Airflow releases or independent fixes and features.","status":"active","version":"3.12.3","language":"en","source_language":"en","source_url":"https://github.com/apache/airflow/tree/main/airflow/providers/papermill","tags":["airflow-provider","papermill","jupyter","notebook","etl","orchestration"],"install":[{"cmd":"pip install apache-airflow-providers-papermill","lang":"bash","label":"Install Papermill Provider"}],"dependencies":[{"reason":"Core Airflow functionality","package":"apache-airflow","optional":false},{"reason":"Required for Jupyter notebook parameterization and execution","package":"papermill[all]","optional":false},{"reason":"Used for reading notebook outputs and scraps","package":"scrapbook[all]","optional":false},{"reason":"Required to run the notebook kernel in the Airflow environment","package":"ipykernel","optional":true}],"imports":[{"note":"This is the primary operator for executing Jupyter notebooks.","symbol":"PapermillOperator","correct":"from airflow.providers.papermill.operators.papermill import PapermillOperator"}],"quickstart":{"code":"from __future__ import annotations\n\nimport pendulum\n\nfrom airflow.models.dag import DAG\nfrom airflow.providers.papermill.operators.papermill import PapermillOperator\n\n# For a real-world scenario, ensure 'hello_world.ipynb' exists in your DAGs folder\n# or a location accessible by Airflow, with a 'parameters' tagged cell.\n# Example 'hello_world.ipynb':\n# # In a cell, add tag 'parameters'\n# msg = \"Default message\"\n# print(f\"Hello, {msg}!\")\n\nwith DAG(\n    dag_id=\"example_papermill_notebook\",\n    start_date=pendulum.datetime(2023, 1, 1, tz=\"UTC\"),\n    schedule=None,\n    catchup=False,\n    tags=[\"papermill\", \"example\"],\n) as dag:\n    run_notebook = PapermillOperator(\n        task_id=\"run_hello_world_notebook\",\n        input_nb=\"/tmp/hello_world.ipynb\", # Replace with actual path or Airflow-accessible path\n        output_nb=\"/tmp/out-{{ ds }}.ipynb\",\n        parameters={\n            \"msgs\": \"Ran from Airflow at {{ ds }}!\"\n        },\n    )\n","lang":"python","description":"This quickstart demonstrates how to define a simple DAG that uses the `PapermillOperator` to execute a Jupyter notebook. The notebook should have a cell tagged as 'parameters' to receive inputs. Remember to replace `/tmp/hello_world.ipynb` with the actual path to your notebook."},"warnings":[{"fix":"Upgrade your Apache Airflow instance to at least version 2.2.0. For provider versions 2.0.0, upgrade to Airflow 2.1.0+.","message":"Provider version 3.0.0 and above requires Apache Airflow 2.2+. Earlier versions of the provider (2.0.0) required Airflow 2.1.0+. Ensure your Airflow installation meets the minimum version requirement for the provider you are installing.","severity":"breaking","affected_versions":">=3.0.0"},{"fix":"Upgrade your Python environment to 3.8 or newer. The latest provider versions support Python >=3.10.","message":"Python 3.7 support was dropped in provider versions 3.2.1 and above. Ensure you are using a supported Python version.","severity":"breaking","affected_versions":">=3.2.1"},{"fix":"Include `ipykernel` and any other specific Python packages needed by your notebooks in your Airflow environment's `requirements.txt` or equivalent.","message":"The `PapermillOperator` executes notebooks locally within the Airflow worker's environment. You must ensure that the notebook's kernel (e.g., `ipykernel`) and any other dependencies required by your notebook code are installed in the Airflow worker's environment.","severity":"gotcha","affected_versions":"All"},{"fix":"Add a cell in your Jupyter notebook and tag it as 'parameters' to define default values and control where injected parameters appear.","message":"Jupyter notebooks intended for use with `PapermillOperator` must have a cell explicitly tagged as 'parameters' if you intend to pass parameters from Airflow. If this tag is missing, parameters will be injected at the top of the notebook, which might not be the desired behavior.","severity":"gotcha","affected_versions":"All"},{"fix":"As a workaround, manually create the missing directory in your Airflow environment. For Astro projects, add `RUN mkdir -p /home/astro/.cache/black/21.7b0/` to your project's Dockerfile.","message":"A known bug with some `papermill` versions can cause 'No such file or directory' errors when writing grammar tables. This typically manifests as `Writing failed: [Errno 2] No such file or directory: '/home/astro/.cache/black/21.7b0/tmpzpsclowd'`.","severity":"gotcha","affected_versions":"Specific `papermill` versions (check GitHub issues for exact range)"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}