Google Cloud Data Catalog

raw JSON →
3.30.0 verified Tue May 12 auth: no python install: verified

Google Cloud Data Catalog is a fully managed, highly scalable data discovery and metadata management service. It allows users to discover, manage, and understand data assets across Google Cloud, supporting technical and business metadata. The Python client library, currently at version 3.30.0, provides programmatic access to the Data Catalog API and follows a regular release cadence as part of the broader `google-cloud-python` client libraries.

pip install google-cloud-datacatalog
error ModuleNotFoundError: No module named 'google.cloud.datacatalog'
cause The `google-cloud-datacatalog` library is not installed in the Python environment, or there's a conflict with other `google-cloud` libraries.
fix
Install the library using pip: pip install google-cloud-datacatalog or, if in a specific environment, python -m pip install google-cloud-datacatalog. Ensure your virtual environment is activated if you are using one.
error AttributeError: module 'google.cloud.datacatalog_v1' has no attribute 'GetIamPolicyRequest'
cause This error typically occurs when the client library version used is incompatible with the API methods or types being called, often due to outdated sample code or an older library version lacking the specific attribute.
fix
Upgrade the google-cloud-datacatalog library to the latest version: pip install --upgrade google-cloud-datacatalog. Additionally, cross-reference your code with the official client library documentation for the correct usage of methods and types for your installed version.
error Error: 5 NOT_FOUND: Project "your-project-id" does not exist
cause This error often indicates that the specified Google Cloud project ID is incorrect, or the Data Catalog API is not enabled for that project, or the service account lacks permission to view the project or its resources.
fix
Verify that the project_id used in your code is correct. Ensure the Data Catalog API is enabled for your project in the Google Cloud Console (APIs & Services > Library). Also, confirm that the service account or user running the code has the necessary IAM permissions (e.g., datacatalog.viewer or datacatalog.admin) on the specified project.
error google.api_core.exceptions.PermissionDenied: 403 The caller does not have permission
cause The service account or user credentials used to authenticate the Data Catalog client lack the necessary Identity and Access Management (IAM) permissions to perform the requested operation on the specified resource.
fix
Grant the appropriate IAM role(s) to the service account or user. For common operations, roles like Data Catalog Viewer (roles/datacatalog.viewer), Data Catalog Editor (roles/datacatalog.editor), or more granular custom roles may be needed. Use the IAM console or gcloud iam commands to assign these roles to the principal accessing Data Catalog.
breaking Google Cloud Data Catalog is deprecated in favor of Dataplex Universal Catalog. While the Data Catalog API and client library still function, new development should leverage Dataplex's comprehensive data management capabilities.
fix Migrate to Google Cloud Dataplex Universal Catalog for metadata management. Review Dataplex documentation for equivalent functionalities and migration guides. `pip install google-cloud-dataplex`.
gotcha The `google-cloud-datacatalog` client library logs RPC events using Python's standard logging, but logs may contain sensitive information and are not propagated to the root logger by default. You must configure logging explicitly.
fix To enable logging without code changes, set `GOOGLE_SDK_PYTHON_LOGGING_SCOPE=google.cloud.datacatalog` (or a broader scope like `google`) in your environment. For code-based configuration, ensure `logging.getLogger("google").propagate = True` if you need events to reach the root logger.
gotcha Python 3.9 and older versions are past their end of life or no longer fully supported. While client libraries may still function, Google will not post further updates supporting Python 3.9, and critical bug fixes will be on a best-effort basis. It is recommended to upgrade to Python 3.10 or higher for full support and features.
fix Upgrade your Python environment to version 3.10 or higher. Re-install the library within a compatible virtual environment to ensure full support and receive future updates.
gotcha The client library failed to authenticate, indicating that Application Default Credentials (ADC) were not found or the Data Catalog API was not enabled for the project. Users must ensure ADC are properly configured and the Data Catalog API is enabled.
fix Configure Application Default Credentials (ADC) by following instructions at https://cloud.google.com/docs/authentication/external/set-up-adc. Additionally, ensure the Data Catalog API is enabled for your Google Cloud project via the Cloud Console or by running `gcloud services enable datacatalog.googleapis.com`.
python os / libc status wheel install import disk mem side effects
3.10 alpine (musl) wheel - 2.03s 72.3M 28.9M noisy
3.10 alpine (musl) - - 2.00s 71.1M 28.6M -
3.10 slim (glibc) wheel 6.1s 1.12s 70M 22.6M noisy
3.10 slim (glibc) - - 1.06s 69M 22.2M -
3.11 alpine (musl) wheel - 2.74s 77.3M 30.7M clean
3.11 alpine (musl) - - 2.89s 76.1M 30.4M -
3.11 slim (glibc) wheel 5.2s 1.68s 75M 24.8M clean
3.11 slim (glibc) - - 1.73s 74M 24.5M -
3.12 alpine (musl) wheel - 2.63s 68.7M 30.5M clean
3.12 alpine (musl) - - 2.80s 67.5M 30.1M -
3.12 slim (glibc) wheel 4.4s 1.94s 66M 24.7M clean
3.12 slim (glibc) - - 2.15s 65M 23.8M -
3.13 alpine (musl) wheel - 2.43s 68.3M 30.9M clean
3.13 alpine (musl) - - 2.60s 67.0M 30.6M -
3.13 slim (glibc) wheel 4.5s 1.87s 66M 25.0M clean
3.13 slim (glibc) - - 2.02s 65M 24.7M -
3.9 alpine (musl) wheel - 1.82s 72.4M 28.9M noisy
3.9 alpine (musl) - - 1.93s 71.2M 28.6M -
3.9 slim (glibc) wheel 6.9s 1.27s 70M 22.6M noisy
3.9 slim (glibc) - - 1.25s 69M 22.3M -

Initializes the DataCatalogClient and attempts to list existing entry groups within a specified Google Cloud project and location. This example assumes default authentication (e.g., via `GOOGLE_APPLICATION_CREDENTIALS` environment variable or Google Cloud SDK).

import os
from google.cloud.datacatalog_v1 import DataCatalogClient

# Set your Google Cloud project ID (e.g., from GOOGLE_CLOUD_PROJECT_ID env var)
# or specify it directly.
project_id = os.environ.get('GOOGLE_CLOUD_PROJECT', 'your-gcp-project-id')

# Create a client
try:
    client = DataCatalogClient()
    print(f"Data Catalog client created successfully for project: {project_id}")

    # Example: List entry groups (pagination handled automatically)
    parent = f"projects/{project_id}/locations/us-central1"
    print(f"Listing entry groups in {parent}...")
    for entry_group in client.list_entry_groups(parent=parent):
        print(f"  Entry Group: {entry_group.name}")

    print("Quickstart finished. Note: Data Catalog is migrating to Dataplex Universal Catalog.")
except Exception as e:
    print(f"An error occurred: {e}")
    print("Please ensure the Data Catalog API is enabled and authentication is set up.")
    print("e.g., export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json")