Google Cloud Dataplex

2.17.0 · active · verified Sun Mar 29

Google Cloud Dataplex is a unified data governance platform that provides an intelligent data fabric to centrally manage, monitor, and govern data across data lakes, data warehouses, and data marts. It enables consistent controls, trusted data access, and powers analytics at scale. The Python client library is currently at version 2.17.0 and is actively maintained with frequent releases.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to instantiate the Dataplex client and list existing lakes within a specified Google Cloud project and location. Ensure your Google Cloud project ID and an appropriate location are set.

import os
from google.cloud import dataplex_v1

def list_lakes(project_id: str, location: str):
    """Lists Dataplex lakes in a given project and location."""
    try:
        client = dataplex_v1.DataplexServiceClient()
        parent = f"projects/{project_id}/locations/{location}"

        print(f"Listing lakes in {parent}:")
        # API calls often return an iterable (pager) for list methods
        for lake in client.list_lakes(parent=parent):
            print(f"- {lake.name} (State: {lake.state.name})")
        print("Lakes listed successfully.")
    except Exception as e:
        print(f"An error occurred: {e}")
        print("Ensure 'gcloud auth application-default login' has been run or GOOGLE_APPLICATION_CREDENTIALS is set.")
        print("Also, verify that the Dataplex API is enabled for your project and the service account has necessary permissions.")

if __name__ == "__main__":
    PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT", "your-gcp-project-id")
    LOCATION = "us-central1" # Or your desired region, e.g., "global" for some resources

    if PROJECT_ID == "your-gcp-project-id":
        print("Please set the 'GOOGLE_CLOUD_PROJECT' environment variable or replace 'your-gcp-project-id' with your actual GCP project ID.")
    else:
        list_lakes(PROJECT_ID, LOCATION)

view raw JSON →