Google Cloud Provider for Apache Airflow

21.0.0 · active · verified Mon Apr 06

The `apache-airflow-providers-google` package extends Apache Airflow with operators, hooks, sensors, and transfers for seamless integration with various Google services, including Google Cloud Platform (GCP), Google Ads, Google Firebase, and Google Workspace. Currently at version 21.0.0, this provider is actively maintained with releases often tied to new features, bug fixes, and updates to underlying Google Cloud client libraries, independent of the core Airflow release cycle.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates a simple Airflow DAG using the `BigQueryInsertJobOperator` from the Google Cloud Provider. It assumes a 'google_cloud_default' Airflow connection is configured, typically leveraging Application Default Credentials (ADC) or a service account key file. Ensure the `GCP_PROJECT_ID` environment variable is set or replace 'your-gcp-project-id' with your actual GCP project ID, and similarly for dataset and table IDs.

import os
from datetime import datetime

from airflow.models.dag import DAG
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator

# Ensure you have a 'google_cloud_default' connection configured in Airflow.
# This connection typically uses Application Default Credentials (ADC).
# For local testing, ensure GOOGLE_APPLICATION_CREDENTIALS points to a service account key.

with DAG(
    dag_id='gcp_bigquery_quickstart',
    start_date=datetime(2023, 1, 1),
    schedule_interval=None,
    catchup=False,
    tags=['gcp', 'bigquery', 'example'],
) as dag:
    insert_job = BigQueryInsertJobOperator(
        task_id='insert_row_to_bigquery',
        project_id=os.environ.get('GCP_PROJECT_ID', 'your-gcp-project-id'),
        configuration={
            'query': {
                'query': 'INSERT INTO `dataset.table` (column1, column2) VALUES ("value1", "value2")',
                'useLegacySql': False,
                'destinationTable': {
                    'projectId': os.environ.get('GCP_PROJECT_ID', 'your-gcp-project-id'),
                    'datasetId': 'your_dataset_id',
                    'tableId': 'your_table_id'
                }
            }
        },
        gcp_conn_id='google_cloud_default',
    )

view raw JSON →