Google Cloud DLP
The `google-cloud-dlp` Python client library provides programmatic access to the Google Cloud Data Loss Prevention (DLP) API, now part of Sensitive Data Protection. It enables scanning, discovering, classifying, and redacting privacy-sensitive data (like PII) within text, images, and various Google Cloud storage repositories such as BigQuery and Cloud Storage. The library is currently at version 3.34.0 and follows the continuous release cadence typical of Google Cloud client libraries, offering frequent updates and enhancements.
Common errors
-
google.api_core.exceptions.PermissionDenied: 403 Permission denied. The DLP API has not been used in project [PROJECT_ID] before or it is disabled.
cause The Google Cloud Data Loss Prevention (DLP) API is not enabled for the specified Google Cloud project, or the service account lacks the necessary permissions to access the API.fix1. Enable the DLP API for your project: `gcloud services enable dlp.googleapis.com`. 2. Ensure your service account has the `roles/dlp.user` (DLP User) role or a custom role with `dlp.jobs.create`, `serviceusage.services.use` permissions. -
ModuleNotFoundError: No module named 'google.cloud.dlp_v2'
cause This error typically indicates that the `google-cloud-dlp` library is not installed, or there's a Python environment issue where the installed package is not accessible, or an older version of the library is installed with a different module structure.fixEnsure the `google-cloud-dlp` library is correctly installed in your active Python environment: `pip install google-cloud-dlp`. If using a virtual environment, activate it first. If the error persists, check for conflicting package versions or try reinstalling Python. -
AttributeError: 'DlpServiceClient' object has no attribute 'project_path'
cause This error often occurs due to a version incompatibility between the `google-cloud-dlp` library and the code being executed. Methods or attributes might have changed in different library versions.fixUpgrade your `google-cloud-dlp` library to the latest version or a compatible version. If you are following specific code samples, ensure your library version matches the documentation's requirements. `pip install --upgrade google-cloud-dlp` or `pip install google-cloud-dlp==<compatible_version>`. -
google.api_core.exceptions.ResourceExhausted: 429 Quota exceeded for quota group 'DlpRequestsPerMinutePerProject' and limit 'Dlp requests per minute per project'.
cause The number of requests to the DLP API has exceeded the allocated quota for your project within a given time frame.fixReduce the rate of your API requests, implement exponential backoff and retry logic, or request a quota increase through the Google Cloud Console's 'IAM & Admin' -> 'Quotas' page if your usage justifies it. -
The request concerns location 'us-central1' but was sent to location 'global'. Regional APIs must be called with a regional endpoint.
cause DLP API operations must be directed to the correct regional endpoint if the resources (e.g., Cloud Storage buckets, BigQuery datasets, KMS keys) or the nature of the operation requires a specific region, but the client was initialized or the request was made to the global endpoint.fixWhen creating the `DlpServiceClient`, specify the regional endpoint. For example, `client_options=ClientOptions(api_endpoint='us-central1-dlp.googleapis.com')` for `us-central1`. Ensure all related resources (like KMS keys) are in the same region.
Warnings
- gotcha Authentication requires enabling the DLP API and typically using Application Default Credentials (ADC) with a service account. Set the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to the path of your service account key file, and ensure the service account has the `roles/dlp.user` role. API keys are supported for some methods but not for `deidentify` or `reidentify` requests that use Cloud Key Management Service (KMS) wrapped keys.
- breaking Python 3.6 and older versions are no longer supported. The last version compatible with Python 2.7 was `google-cloud-dlp==1.1.0`.
- gotcha The library's logging functionality can emit RPC events that may contain sensitive information. Access to these logs should be restricted, and users should not depend on the immutability of log message content or levels.
- gotcha When operating behind a TLS-intercepting proxy, gRPC-based Google Cloud client libraries (including DLP) may encounter SSL certificate trust issues.
- gotcha Many DLP API operations, such as `inspect_content` and `deidentify_content`, require a `parent` argument in the format `projects/{project_id}`. Forgetting this or providing an incorrect project ID will result in errors.
- gotcha It is crucial to explicitly enable the Cloud Data Loss Prevention API in your Google Cloud project via the GCP Console or `gcloud services enable dlp.googleapis.com` before using the client library. Otherwise, you will receive `API not enabled` errors.
Install
-
pip install google-cloud-dlp
Imports
- DlpServiceClient
from google.cloud.dlp import DlpServiceClient
from google.cloud import dlp_v2 client = dlp_v2.DlpServiceClient()
Quickstart
import os
from google.cloud import dlp_v2
from google.cloud.dlp_v2 import types
def inspect_text(project_id: str, text_content: str):
"""
Inspects a string of text for sensitive data using Google Cloud DLP.
Args:
project_id: The Google Cloud project ID.
text_content: The string to inspect.
"""
if not project_id:
print("GOOGLE_CLOUD_PROJECT environment variable or project_id not set.")
return
client = dlp_v2.DlpServiceClient()
# Construct the item to inspect
item = {"value": text_content}
# The info types to search for in the content.
# See https://cloud.google.com/sensitive-data-protection/docs/infotypes-reference
info_types = [{"name": "EMAIL_ADDRESS"}, {"name": "PHONE_NUMBER"}]
# The minimum likelihood to constitute a match.
min_likelihood = types.Likelihood.POSSIBLE
# Configuration for the inspection request
inspect_config = {
"info_types": info_types,
"min_likelihood": min_likelihood,
"limits": {"max_findings_per_request": 0} # 0 for no limit
}
# Construct the parent path
parent = f"projects/{project_id}"
# Call the API
try:
response = client.inspect_content(
request={
"parent": parent,
"inspect_config": inspect_config,
"item": item,
}
)
if response.result.findings:
print("Findings:")
for finding in response.result.findings:
if finding.quote:
print(f" Quote: {finding.quote}")
print(f" Info type: {finding.info_type.name}")
print(f" Likelihood: {types.Likelihood(finding.likelihood).name}")
else:
print("No findings.")
except Exception as e:
print(f"Error during DLP inspection: {e}")
if __name__ == "__main__":
# Set your Google Cloud Project ID as an environment variable or replace 'your-gcp-project-id'
project = os.environ.get("GOOGLE_CLOUD_PROJECT", "")
if not project:
raise ValueError("Please set the GOOGLE_CLOUD_PROJECT environment variable or provide a project_id.")
sensitive_text = "My email is test@example.com and my phone number is (123) 456-7890."
inspect_text(project, sensitive_text)