Google Cloud DLP

3.34.0 · active · verified Sat Mar 28

The `google-cloud-dlp` Python client library provides programmatic access to the Google Cloud Data Loss Prevention (DLP) API, now part of Sensitive Data Protection. It enables scanning, discovering, classifying, and redacting privacy-sensitive data (like PII) within text, images, and various Google Cloud storage repositories such as BigQuery and Cloud Storage. The library is currently at version 3.34.0 and follows the continuous release cadence typical of Google Cloud client libraries, offering frequent updates and enhancements.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the DLP client and inspect a string of text for sensitive information like email addresses and phone numbers. Ensure your Google Cloud Project ID is set as the `GOOGLE_CLOUD_PROJECT` environment variable for authentication, and the DLP API is enabled for your project.

import os
from google.cloud import dlp_v2
from google.cloud.dlp_v2 import types

def inspect_text(project_id: str, text_content: str):
    """
    Inspects a string of text for sensitive data using Google Cloud DLP.

    Args:
        project_id: The Google Cloud project ID.
        text_content: The string to inspect.
    """
    if not project_id:
        print("GOOGLE_CLOUD_PROJECT environment variable or project_id not set.")
        return

    client = dlp_v2.DlpServiceClient()

    # Construct the item to inspect
    item = {"value": text_content}

    # The info types to search for in the content.
    # See https://cloud.google.com/sensitive-data-protection/docs/infotypes-reference
    info_types = [{"name": "EMAIL_ADDRESS"}, {"name": "PHONE_NUMBER"}]

    # The minimum likelihood to constitute a match.
    min_likelihood = types.Likelihood.POSSIBLE

    # Configuration for the inspection request
    inspect_config = {
        "info_types": info_types,
        "min_likelihood": min_likelihood,
        "limits": {"max_findings_per_request": 0} # 0 for no limit
    }

    # Construct the parent path
    parent = f"projects/{project_id}"

    # Call the API
    try:
        response = client.inspect_content(
            request={
                "parent": parent,
                "inspect_config": inspect_config,
                "item": item,
            }
        )

        if response.result.findings:
            print("Findings:")
            for finding in response.result.findings:
                if finding.quote:
                    print(f"  Quote: {finding.quote}")
                print(f"  Info type: {finding.info_type.name}")
                print(f"  Likelihood: {types.Likelihood(finding.likelihood).name}")
        else:
            print("No findings.")

    except Exception as e:
        print(f"Error during DLP inspection: {e}")

if __name__ == "__main__":
    # Set your Google Cloud Project ID as an environment variable or replace 'your-gcp-project-id'
    project = os.environ.get("GOOGLE_CLOUD_PROJECT", "") 
    if not project:
        raise ValueError("Please set the GOOGLE_CLOUD_PROJECT environment variable or provide a project_id.")
    
    sensitive_text = "My email is test@example.com and my phone number is (123) 456-7890."
    inspect_text(project, sensitive_text)

view raw JSON →