Google Cloud Provider for Apache Airflow
The `apache-airflow-providers-google` package extends Apache Airflow with operators, hooks, sensors, and transfers for seamless integration with various Google services, including Google Cloud Platform (GCP), Google Ads, Google Firebase, and Google Workspace. Currently at version 21.0.0, this provider is actively maintained with releases often tied to new features, bug fixes, and updates to underlying Google Cloud client libraries, independent of the core Airflow release cycle.
Common errors
-
ModuleNotFoundError: No module named 'airflow.providers'
cause This error occurs when the 'apache-airflow-providers-google' package is not installed or not recognized by Airflow.fixEnsure that the 'apache-airflow-providers-google' package is installed and recognized by Airflow. You can install it using pip: 'pip install apache-airflow-providers-google'. -
ImportError: cannot import name 'BigQueryOperator' from 'airflow.providers.google.cloud.operators.bigquery'
cause This error occurs when the 'BigQueryOperator' cannot be imported due to missing or incompatible versions of the 'apache-airflow-providers-google' package.fixEnsure that the 'apache-airflow-providers-google' package is installed and up to date. You can install or upgrade it using pip: 'pip install --upgrade apache-airflow-providers-google'. -
ImportError: cannot import name 'SUPERVISOR_COMMS' from 'airflow.sdk.execution_time.task_runner'
cause This error occurs due to an issue with the 'dag.test()' function in Airflow, leading to an import error.fixThis issue has been reported in the Airflow GitHub repository. It's recommended to check for updates or patches that address this problem. -
ImportError: cannot import name 'DAG' from 'airflow' (unknown location)
cause This error occurs when the 'DAG' class cannot be imported from the 'airflow' module, possibly due to installation issues or incorrect file naming.fixEnsure that Airflow is installed correctly and that there are no files named 'airflow.py' in your project directory that could cause import conflicts. -
ImportError: cannot import name '_check_google_client_version' from 'pandas_gbq.gbq'
cause This error occurs when the 'bigquery_hook' in Airflow tries to import '_check_google_client_version' from 'pandas_gbq', but the function has been removed in newer versions of 'pandas_gbq'.fixDowngrade 'pandas_gbq' to a version that includes '_check_google_client_version', such as version 0.14.1, to maintain compatibility with Airflow 1.10.9.
Warnings
- breaking Each version of `apache-airflow-providers-google` has a minimum required Apache Airflow core version. For example, provider version 21.0.0 requires `apache-airflow>=2.11.0`. Installing a provider version incompatible with your Airflow core can lead to unexpected behavior or dependency conflicts.
- breaking Frequent updates to the underlying `google-ads` client library (e.g., v5 to v8, v12 to v13) and changes in object types (from native protobuf to proto-plus) have historically caused breaking changes and dependency conflicts, especially when other Google client libraries are also used in the same environment. This can lead to `VersionConflict` errors.
- breaking The `delegate_to` parameter for service account impersonation has been deprecated and removed in favor of `impersonation_chain`. Using `delegate_to` in newer provider versions will cause errors.
- breaking Operators related to Google Cloud Data Catalog have been renamed and/or moved to the Dataplex provider. For example, `CloudDataCatalogCreateEntryOperator` has been replaced by `DataplexCatalogCreateEntryOperator`.
- gotcha Authentication to Google Cloud relies on Airflow connections. The default `google_cloud_default` connection typically uses Application Default Credentials (ADC). Misconfiguration of ADC (e.g., `GOOGLE_APPLICATION_CREDENTIALS` not set, or incorrect service account key in Airflow connection) is a common source of authorization errors.
- gotcha Historically, conflicts between `apache-airflow-providers-google` and `apache-airflow-providers-apache-beam` have arisen due to differing dependencies on `google-cloud-bigquery` client versions, especially when using `apache-beam[gcp]` extra. This can lead to unexpected behavior in BigQuery operators.
- gotcha Installing certain Python packages with compiled extensions (e.g., `scikit-learn`, `numpy`, `pandas`) can fail in minimal environments like `alpine` if build tools (like `gcc` and Python development headers) are not installed. The error `ERROR: Unknown compiler(s)` or similar during metadata preparation indicates missing compilers.
Install
-
pip install apache-airflow-providers-google
Imports
- BigQueryInsertJobOperator
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator
- GCSUploadSessionCompleteSensor
from airflow.providers.google.cloud.sensors.gcs import GCSUploadSessionCompleteSensor
- CloudStorageHook
from airflow.providers.google.cloud.hooks.gcs import GCSHook
- GoogleBaseHook
from airflow.providers.google.common.hooks.base_google import GoogleBaseHook
Quickstart
import os
from datetime import datetime
from airflow.models.dag import DAG
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator
# Ensure you have a 'google_cloud_default' connection configured in Airflow.
# This connection typically uses Application Default Credentials (ADC).
# For local testing, ensure GOOGLE_APPLICATION_CREDENTIALS points to a service account key.
with DAG(
dag_id='gcp_bigquery_quickstart',
start_date=datetime(2023, 1, 1),
schedule_interval=None,
catchup=False,
tags=['gcp', 'bigquery', 'example'],
) as dag:
insert_job = BigQueryInsertJobOperator(
task_id='insert_row_to_bigquery',
project_id=os.environ.get('GCP_PROJECT_ID', 'your-gcp-project-id'),
configuration={
'query': {
'query': 'INSERT INTO `dataset.table` (column1, column2) VALUES ("value1", "value2")',
'useLegacySql': False,
'destinationTable': {
'projectId': os.environ.get('GCP_PROJECT_ID', 'your-gcp-project-id'),
'datasetId': 'your_dataset_id',
'tableId': 'your_table_id'
}
}
},
gcp_conn_id='google_cloud_default',
)