Google Cloud Provider for Apache Airflow
The `apache-airflow-providers-google` package extends Apache Airflow with operators, hooks, sensors, and transfers for seamless integration with various Google services, including Google Cloud Platform (GCP), Google Ads, Google Firebase, and Google Workspace. Currently at version 21.0.0, this provider is actively maintained with releases often tied to new features, bug fixes, and updates to underlying Google Cloud client libraries, independent of the core Airflow release cycle.
Warnings
- breaking Each version of `apache-airflow-providers-google` has a minimum required Apache Airflow core version. For example, provider version 21.0.0 requires `apache-airflow>=2.11.0`. Installing a provider version incompatible with your Airflow core can lead to unexpected behavior or dependency conflicts.
- breaking Frequent updates to the underlying `google-ads` client library (e.g., v5 to v8, v12 to v13) and changes in object types (from native protobuf to proto-plus) have historically caused breaking changes and dependency conflicts, especially when other Google client libraries are also used in the same environment. This can lead to `VersionConflict` errors.
- breaking The `delegate_to` parameter for service account impersonation has been deprecated and removed in favor of `impersonation_chain`. Using `delegate_to` in newer provider versions will cause errors.
- breaking Operators related to Google Cloud Data Catalog have been renamed and/or moved to the Dataplex provider. For example, `CloudDataCatalogCreateEntryOperator` has been replaced by `DataplexCatalogCreateEntryOperator`.
- gotcha Authentication to Google Cloud relies on Airflow connections. The default `google_cloud_default` connection typically uses Application Default Credentials (ADC). Misconfiguration of ADC (e.g., `GOOGLE_APPLICATION_CREDENTIALS` not set, or incorrect service account key in Airflow connection) is a common source of authorization errors.
- gotcha Historically, conflicts between `apache-airflow-providers-google` and `apache-airflow-providers-apache-beam` have arisen due to differing dependencies on `google-cloud-bigquery` client versions, especially when using `apache-beam[gcp]` extra. This can lead to unexpected behavior in BigQuery operators.
Install
-
pip install apache-airflow-providers-google
Imports
- BigQueryInsertJobOperator
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator
- GCSUploadSessionCompleteSensor
from airflow.providers.google.cloud.sensors.gcs import GCSUploadSessionCompleteSensor
- CloudStorageHook
from airflow.providers.google.cloud.hooks.gcs import GCSHook
- GoogleBaseHook
from airflow.providers.google.common.hooks.base_google import GoogleBaseHook
Quickstart
import os
from datetime import datetime
from airflow.models.dag import DAG
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator
# Ensure you have a 'google_cloud_default' connection configured in Airflow.
# This connection typically uses Application Default Credentials (ADC).
# For local testing, ensure GOOGLE_APPLICATION_CREDENTIALS points to a service account key.
with DAG(
dag_id='gcp_bigquery_quickstart',
start_date=datetime(2023, 1, 1),
schedule_interval=None,
catchup=False,
tags=['gcp', 'bigquery', 'example'],
) as dag:
insert_job = BigQueryInsertJobOperator(
task_id='insert_row_to_bigquery',
project_id=os.environ.get('GCP_PROJECT_ID', 'your-gcp-project-id'),
configuration={
'query': {
'query': 'INSERT INTO `dataset.table` (column1, column2) VALUES ("value1", "value2")',
'useLegacySql': False,
'destinationTable': {
'projectId': os.environ.get('GCP_PROJECT_ID', 'your-gcp-project-id'),
'datasetId': 'your_dataset_id',
'tableId': 'your_table_id'
}
}
},
gcp_conn_id='google_cloud_default',
)