Google Cloud Dataflow Client

0.11.0 · active · verified Sun Mar 29

The `google-cloud-dataflow-client` is the Python client library for interacting with the Google Cloud Dataflow API. It allows users to programmatically manage Dataflow jobs, such as listing existing jobs, getting job details, or submitting new ones based on templates. This library provides an interface to the Dataflow service API, distinct from the `apache-beam` SDK, which is used for defining Dataflow pipelines themselves. It is part of the broader `google-cloud-python` ecosystem and generally follows its release cadence, receiving regular updates for bug fixes and new features.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the Dataflow Jobs client and list existing Dataflow jobs in a specified Google Cloud project and region. Ensure your `GOOGLE_CLOUD_PROJECT` environment variable is set and you have authenticated via `gcloud auth application-default login`.

import os
from google.cloud.dataflow_v1beta3 import JobsClient
from google.cloud.dataflow_v1beta3.types import ListJobsRequest

# Set your Google Cloud Project ID (e.g., via GOOGLE_CLOUD_PROJECT environment variable)
project_id = os.environ.get("GOOGLE_CLOUD_PROJECT", "your-gcp-project-id")
region = "us-central1" # Dataflow jobs are regional, specify the region where your jobs run

if project_id == "your-gcp-project-id":
    print("WARNING: Please set the GOOGLE_CLOUD_PROJECT environment variable or replace 'your-gcp-project-id'.")
    exit()

try:
    # Initialize the client (will use Application Default Credentials by default)
    client = JobsClient()

    # Create a request to list jobs in a specific project and region
    request = ListJobsRequest(project_id=project_id, location=region)

    print(f"Listing Dataflow jobs for project '{project_id}' in region '{region}':")
    response = client.list_jobs(request=request)

    jobs_found = False
    for job in response.jobs:
        print(f"  Job ID: {job.id}, Name: {job.name}, Type: {job.type}, State: {job.current_state}")
        jobs_found = True

    if not jobs_found:
        print("  No Dataflow jobs found.")

except Exception as e:
    print(f"An error occurred: {e}")
    print("Ensure you have authenticated with `gcloud auth application-default login` and enabled the Dataflow API in your project.")

view raw JSON →