Dagster GCP

0.29.0 · active · verified Sat Apr 11

Dagster-gcp is a Python library that provides components for interacting with Google Cloud Platform (GCP) services within the Dagster data orchestration framework. It includes resources and I/O managers for services like BigQuery, Google Cloud Storage (GCS), and Dataproc. The library is actively maintained with frequent releases, often in conjunction with the core Dagster framework, to ensure compatibility and introduce new features.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates defining a Dagster asset that interacts with Google BigQuery using `BigQueryResource`. It shows how to configure the resource and execute a simple SQL query. Authentication is expected via standard GCP mechanisms (like `GOOGLE_APPLICATION_CREDENTIALS` environment variable) or configured directly on the resource via `gcp_credentials`.

import os
from dagster import Definitions, asset, EnvVar
from dagster_gcp import BigQueryResource

# Ensure GOOGLE_APPLICATION_CREDENTIALS or similar env var is set for local execution
# For simplicity, project is hardcoded or read from an env var. In production, consider more robust auth.

@asset
def my_bq_table(bigquery: BigQueryResource):
    """An asset that queries a BigQuery table."""
    project_id = os.environ.get('GCP_PROJECT_ID', 'your-gcp-project')
    dataset_id = os.environ.get('BIGQUERY_DATASET', 'my_dataset')
    table_id = os.environ.get('BIGQUERY_TABLE', 'my_table')
    
    # Example: Execute a simple query
    query = f"SELECT COUNT(*) FROM `{project_id}.{dataset_id}.{table_id}`"
    
    with bigquery.get_client() as client:
        query_job = client.query(query)
        results = query_job.result()
        print(f"Query executed successfully. First row: {list(results)[0]}")

defs = Definitions(
    assets=[my_bq_table],
    resources={
        "bigquery": BigQueryResource(
            project=EnvVar("GCP_PROJECT_ID"), # Use EnvVar for production
            location=EnvVar("GCP_REGION", default_value="us-central1"),
            # You can also pass gcp_credentials as a base64 encoded JSON string via EnvVar
        )
    },
)

# To run this locally:
# 1. Set environment variables, e.g., GOOGLE_APPLICATION_CREDENTIALS, GCP_PROJECT_ID, BIGQUERY_DATASET, BIGQUERY_TABLE
# 2. Run `dagster dev -f your_file.py`
# 3. Navigate to Dagit UI, find 'my_bq_table' asset and materialize it.

view raw JSON →