Google Cloud DLP
The `google-cloud-dlp` Python client library provides programmatic access to the Google Cloud Data Loss Prevention (DLP) API, now part of Sensitive Data Protection. It enables scanning, discovering, classifying, and redacting privacy-sensitive data (like PII) within text, images, and various Google Cloud storage repositories such as BigQuery and Cloud Storage. The library is currently at version 3.34.0 and follows the continuous release cadence typical of Google Cloud client libraries, offering frequent updates and enhancements.
Warnings
- gotcha Authentication requires enabling the DLP API and typically using Application Default Credentials (ADC) with a service account. Set the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to the path of your service account key file, and ensure the service account has the `roles/dlp.user` role. API keys are supported for some methods but not for `deidentify` or `reidentify` requests that use Cloud Key Management Service (KMS) wrapped keys.
- breaking Python 3.6 and older versions are no longer supported. The last version compatible with Python 2.7 was `google-cloud-dlp==1.1.0`.
- gotcha The library's logging functionality can emit RPC events that may contain sensitive information. Access to these logs should be restricted, and users should not depend on the immutability of log message content or levels.
- gotcha When operating behind a TLS-intercepting proxy, gRPC-based Google Cloud client libraries (including DLP) may encounter SSL certificate trust issues.
- gotcha Many DLP API operations, such as `inspect_content` and `deidentify_content`, require a `parent` argument in the format `projects/{project_id}`. Forgetting this or providing an incorrect project ID will result in errors.
- gotcha It is crucial to explicitly enable the Cloud Data Loss Prevention API in your Google Cloud project via the GCP Console or `gcloud services enable dlp.googleapis.com` before using the client library. Otherwise, you will receive `API not enabled` errors.
Install
-
pip install google-cloud-dlp
Imports
- DlpServiceClient
from google.cloud import dlp_v2 client = dlp_v2.DlpServiceClient()
Quickstart
import os
from google.cloud import dlp_v2
from google.cloud.dlp_v2 import types
def inspect_text(project_id: str, text_content: str):
"""
Inspects a string of text for sensitive data using Google Cloud DLP.
Args:
project_id: The Google Cloud project ID.
text_content: The string to inspect.
"""
if not project_id:
print("GOOGLE_CLOUD_PROJECT environment variable or project_id not set.")
return
client = dlp_v2.DlpServiceClient()
# Construct the item to inspect
item = {"value": text_content}
# The info types to search for in the content.
# See https://cloud.google.com/sensitive-data-protection/docs/infotypes-reference
info_types = [{"name": "EMAIL_ADDRESS"}, {"name": "PHONE_NUMBER"}]
# The minimum likelihood to constitute a match.
min_likelihood = types.Likelihood.POSSIBLE
# Configuration for the inspection request
inspect_config = {
"info_types": info_types,
"min_likelihood": min_likelihood,
"limits": {"max_findings_per_request": 0} # 0 for no limit
}
# Construct the parent path
parent = f"projects/{project_id}"
# Call the API
try:
response = client.inspect_content(
request={
"parent": parent,
"inspect_config": inspect_config,
"item": item,
}
)
if response.result.findings:
print("Findings:")
for finding in response.result.findings:
if finding.quote:
print(f" Quote: {finding.quote}")
print(f" Info type: {finding.info_type.name}")
print(f" Likelihood: {types.Likelihood(finding.likelihood).name}")
else:
print("No findings.")
except Exception as e:
print(f"Error during DLP inspection: {e}")
if __name__ == "__main__":
# Set your Google Cloud Project ID as an environment variable or replace 'your-gcp-project-id'
project = os.environ.get("GOOGLE_CLOUD_PROJECT", "")
if not project:
raise ValueError("Please set the GOOGLE_CLOUD_PROJECT environment variable or provide a project_id.")
sensitive_text = "My email is test@example.com and my phone number is (123) 456-7890."
inspect_text(project, sensitive_text)