{"id":1325,"library":"apache-airflow-providers-common-io","title":"Apache Airflow Common IO Provider","description":"The Apache Airflow Common IO Provider (apache-airflow-providers-common-io, current version 1.7.2) offers a unified interface for interacting with various file systems within Airflow tasks, abstracting away the underlying storage details. It aims to simplify DAG development by providing generic operators and hooks that can work across different storage backends (e.g., local, S3, GCS, Azure Blob Storage), with specific implementations provided by other Airflow provider packages. This provider follows the regular Apache Airflow provider release cadence, receiving updates frequently alongside core Airflow releases.","status":"active","version":"1.7.2","language":"en","source_language":"en","source_url":"https://github.com/apache/airflow/tree/main/airflow/providers/common/io","tags":["airflow","provider","io","filesystem","storage"],"install":[{"cmd":"pip install apache-airflow-providers-common-io","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"This is an Airflow provider and requires Apache Airflow to run. Specific cloud storage functionality (e.g., S3, GCS) will require their respective provider packages (e.g., apache-airflow-providers-s3, apache-airflow-providers-google) to be installed in addition to common-io.","package":"apache-airflow"}],"imports":[{"symbol":"FileSystemHook","correct":"from airflow.providers.common.io.hooks.filesystem import FileSystemHook"},{"note":"Operators are located in the `operators` submodule, not directly under the package root.","wrong":"from airflow.providers.common.io.file_transfer import FileTransferOperator","symbol":"FileTransferOperator","correct":"from airflow.providers.common.io.operators.file_transfer import FileTransferOperator"}],"quickstart":{"code":"from __future__ import annotations\n\nimport os\nimport tempfile\nfrom pathlib import Path\n\nimport pendulum\n\nfrom airflow.models.dag import DAG\nfrom airflow.operators.python import PythonOperator\nfrom airflow.providers.common.io.hooks.filesystem import FileSystemHook\n\ndef _demonstrate_common_io_hook():\n    # Create a temporary local directory and files to demonstrate listing\n    with tempfile.TemporaryDirectory() as tmpdir:\n        test_dir_path = Path(tmpdir) / \"common_io_test_data\"\n        test_dir_path.mkdir(exist_ok=True)\n        (test_dir_path / \"file_a.txt\").write_text(\"Content A\")\n        (test_dir_path / \"file_b.txt\").write_text(\"Content B\")\n        print(f\"Created dummy files in: {test_dir_path}\")\n\n        # Instantiate FileSystemHook.\n        # In a real Airflow setup, this hook would typically resolve an Airflow Connection ID\n        # (e.g., `conn_id='fs_default'`) to determine the base path and other config.\n        # For this quickstart, we explicitly set the base_path to our temporary directory\n        # to make it runnable without prior Airflow UI connection setup.\n        hook = FileSystemHook()\n        hook.base_path = str(test_dir_path)\n\n        print(f\"Listing files in base_path: {hook.base_path}\")\n        # List files using the common IO interface\n        listed_items = list(hook.list_path()) # list_path returns a generator of BasePath objects\n        \n        if listed_items:\n            print(\"Files found:\")\n            for item in listed_items:\n                print(f\"- {item.path_str}\")\n        else:\n            print(\"No files found.\")\n\nwith DAG(\n    dag_id=\"common_io_quickstart\",\n    start_date=pendulum.datetime(2023, 10, 26, tz=\"UTC\"),\n    catchup=False,\n    schedule=None,\n    tags=[\"common_io\", \"example\", \"quickstart\"],\n) as dag:\n    list_files_task = PythonOperator(\n        task_id=\"list_files_with_common_io\",\n        python_callable=_demonstrate_common_io_hook,\n    )","lang":"python","description":"This quickstart demonstrates how to use the `FileSystemHook` from the `common-io` provider to list files in a local directory. While `common-io` aims to abstract different file systems, the `FileSystemHook` (and `FileTransferOperator`) typically rely on Airflow Connection IDs (e.g., `fs_default` for local file systems, `aws_default` for S3) to configure their backend. This example manually sets the `base_path` to a temporary directory for local execution without requiring prior Airflow UI connection setup. In a production DAG, you would typically pass `conn_id` to the hook/operator and configure the connection in Airflow."},"warnings":[{"fix":"Ensure the specific provider package for your target file system (e.g., S3, GCS) is also installed via `pip install apache-airflow-providers-<service>`.","message":"The `common-io` provider offers a generic interface, but it does NOT provide the concrete implementations for specific cloud storage services (e.g., S3, GCS, Azure Blob Storage). To use `common-io` hooks or operators with these services, you must also install the respective cloud provider packages (e.g., `apache-airflow-providers-s3`, `apache-airflow-providers-google`). Without them, `common-io` will not be able to interact with those backends.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Carefully configure Airflow Connections in the UI or via code. Ensure connection types (e.g., 'fs' for local filesystem, 'aws' for S3) match the intended backend and that connection parameters (e.g., base path, credentials) are correct.","message":"Operators like `FileTransferOperator` and hooks like `FileSystemHook` heavily rely on Airflow Connection IDs (e.g., `source_filesystem_conn_id`, `destination_filesystem_conn_id`). Misconfigured connections, incorrect connection types, or missing connections will lead to runtime errors when the task attempts to interact with the file system.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade your Python environment to 3.10 or newer. Ensure your Airflow environment also supports and is configured for Python >=3.10.","message":"The `apache-airflow-providers-common-io` package requires Python >=3.10. Users running Airflow on older Python versions will encounter installation or runtime errors due to this dependency.","severity":"gotcha","affected_versions":"1.7.0 and later"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}