Scrubadub: PII Redaction Library

2.0.1 · active · verified Wed Apr 15

Scrubadub is a Python library designed to clean personally identifiable information (PII) from unstructured text. It automatically detects and replaces various types of sensitive data like names, email addresses, phone numbers, and more, with configurable placeholders. The library is actively maintained, currently at version 2.0.1, and receives regular updates, including major releases that introduce new detectors and architectural changes.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates the basic usage of `scrubadub.clean()` for straightforward PII redaction. It also illustrates how to use the `Scrubber` class to manually add detectors for more customized control over the scrubbing process, especially useful for optional or external detectors not loaded by default.

import scrubadub

text = "My cat can be contacted on example@example.com, or 1800 555-5555. His name is John Doe."
cleaned_text = scrubadub.clean(text)
print(cleaned_text)

# For more control, use the Scrubber class
from scrubadub import Scrubber
from scrubadub.detectors import TextBlobNameDetector # Example of an optional detector

scrubber = Scrubber()
# Add a detector if it's not enabled by default or for custom configuration
scrubber.add_detector(TextBlobNameDetector())
controlled_cleaned_text = scrubber.clean(text)
print(controlled_cleaned_text)

view raw JSON →