Google Cloud Dataplex
Google Cloud Dataplex is a unified data governance platform that provides an intelligent data fabric to centrally manage, monitor, and govern data across data lakes, data warehouses, and data marts. It enables consistent controls, trusted data access, and powers analytics at scale. The Python client library is currently at version 2.17.0 and is actively maintained with frequent releases.
Warnings
- breaking Some metadata stored in Dataplex Universal Catalog changed on January 12, 2026, to align with original source systems (e.g., Vertex AI, Bigtable, Spanner). Workloads that depend on the specific structure or content of this metadata will need to be adjusted to preserve continuity.
- gotcha Dataplex enforces strict location constraints for resources. Zones (regional or multi-regional) and their associated assets (e.g., GCS buckets, BigQuery datasets) must strictly match the zone's location type. Attempting to add an asset that violates these constraints (e.g., a 'EU' multi-region BigQuery dataset to a 'europe-west1' regional zone) will result in asset attachment failures.
- deprecated Dataplex Explore was deprecated on July 22, 2024. Functionality provided by Dataplex Explore is now expected to be handled by BigQuery Studio.
- gotcha When programmatically querying Dataplex Catalog Entries using the Python client, you might only retrieve custom Aspect *names* but not their corresponding *values* by default.
Install
-
pip install google-cloud-dataplex
Imports
- DataplexServiceClient
from google.cloud import dataplex_v1
Quickstart
import os
from google.cloud import dataplex_v1
def list_lakes(project_id: str, location: str):
"""Lists Dataplex lakes in a given project and location."""
try:
client = dataplex_v1.DataplexServiceClient()
parent = f"projects/{project_id}/locations/{location}"
print(f"Listing lakes in {parent}:")
# API calls often return an iterable (pager) for list methods
for lake in client.list_lakes(parent=parent):
print(f"- {lake.name} (State: {lake.state.name})")
print("Lakes listed successfully.")
except Exception as e:
print(f"An error occurred: {e}")
print("Ensure 'gcloud auth application-default login' has been run or GOOGLE_APPLICATION_CREDENTIALS is set.")
print("Also, verify that the Dataplex API is enabled for your project and the service account has necessary permissions.")
if __name__ == "__main__":
PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT", "your-gcp-project-id")
LOCATION = "us-central1" # Or your desired region, e.g., "global" for some resources
if PROJECT_ID == "your-gcp-project-id":
print("Please set the 'GOOGLE_CLOUD_PROJECT' environment variable or replace 'your-gcp-project-id' with your actual GCP project ID.")
else:
list_lakes(PROJECT_ID, LOCATION)