Google Cloud Datacatalog Lineage
The `google-cloud-datacatalog-lineage` client library allows Python developers to interact with the Google Cloud Datacatalog Lineage API. This API helps track the origin and transformation of data within Google Cloud, providing visibility into data pipelines. The current version is 0.6.0, and it follows Google Cloud's frequent release cadence for client libraries, often aligning with underlying API changes or bug fixes.
Common errors
-
ModuleNotFoundError: No module named 'google.cloud.datacatalog_lineage'
cause Attempting to import the client from an unversioned or incorrect module path.fixChange your import statement to `from google.cloud import datacatalog_lineage_v1`. -
google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials.
cause The application could not find valid Google Cloud credentials in the environment.fixSet the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to the path of your service account key file, or ensure your application is running in a Google Cloud environment with default credentials configured. -
google.api_core.exceptions.NotFound: 404 Not Found: Requested entity was not found
cause The specified project, location, or resource path (e.g., for `parent`) does not exist or is incorrect, or the authenticated principal lacks permission to access it.fixDouble-check the `project_id`, `location`, and any resource names. Verify that the service account or user has the necessary Data Catalog Lineage Viewer/Editor roles (`roles/datacatalog.viewer`, `roles/datacatalog.editor`) for the project.
Warnings
- breaking The library is currently in version 0.x.x, indicating that the API surface may not be fully stable and could introduce breaking changes in minor versions without a major version bump. Always review release notes when upgrading.
- gotcha Incorrect or missing Google Cloud authentication credentials will lead to `DefaultCredentialsError` or `PermissionDenied` errors when making API calls.
- gotcha List operations (e.g., `list_processes`, `list_runs`) return iterators. If you need all results in a single list, you must explicitly convert the iterator to a list or iterate through it.
- gotcha The library requires Python 3.9 or newer. Running with older Python versions will result in installation failures or runtime errors.
Install
-
pip install google-cloud-datacatalog-lineage
Imports
- LineageClient
from google.cloud import datacatalog_lineage
from google.cloud import datacatalog_lineage_v1
Quickstart
import os
from google.cloud import datacatalog_lineage_v1
# Ensure GOOGLE_APPLICATION_CREDENTIALS is set, or running in a GCP environment.
# For local development, set GOOGLE_APPLICATION_CREDENTIALS environment variable:
# export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/keyfile.json"
# Replace with your actual GCP project ID and desired location
project_id = os.environ.get("GCP_PROJECT_ID", "your-gcp-project-id")
location = "us-central1" # e.g., "us-central1", "europe-west1"
try:
client = datacatalog_lineage_v1.LineageClient()
parent = f"projects/{project_id}/locations/{location}"
print(f"Listing processes in {parent}...")
# The list_processes method returns an iterable
for process in client.list_processes(parent=parent):
print(f"Found Process: {process.name}")
print("Successfully listed processes (or completed iteration if none found).")
except Exception as e:
print(f"An error occurred: {e}")
print("Ensure 'GCP_PROJECT_ID' environment variable is set or replace placeholder.")
print("Ensure `google-cloud-datacatalog-lineage` is installed and authenticated.")