{"id":6874,"library":"scrubadub","title":"Scrubadub: PII Redaction Library","description":"Scrubadub is a Python library designed to clean personally identifiable information (PII) from unstructured text. It automatically detects and replaces various types of sensitive data like names, email addresses, phone numbers, and more, with configurable placeholders. The library is actively maintained, currently at version 2.0.1, and receives regular updates, including major releases that introduce new detectors and architectural changes.","status":"active","version":"2.0.1","language":"en","source_language":"en","source_url":"https://github.com/LeapBeyond/scrubadub","tags":["PII","redaction","anonymization","privacy","text processing","NLP"],"install":[{"cmd":"pip install scrubadub","lang":"bash","label":"Core library"},{"cmd":"pip install scrubadub scrubadub-spacy scrubadub-stanford scrubadub-address","lang":"bash","label":"With optional detectors"}],"dependencies":[{"reason":"Required for core functionality.","package":"catalogue","optional":false},{"reason":"Required for core functionality.","package":"dateparser","optional":false},{"reason":"Required for core functionality.","package":"faker","optional":false},{"reason":"Required for core functionality (phone number detection).","package":"phonenumbers","optional":false},{"reason":"Required for core functionality.","package":"python-stdnum","optional":false},{"reason":"Required for core functionality.","package":"scikit-learn","optional":false},{"reason":"Required for core functionality (name detection).","package":"textblob","optional":false},{"reason":"Required for core functionality.","package":"typing-extensions","optional":false},{"reason":"Enables spaCy-based detectors for enhanced name and entity recognition.","package":"scrubadub-spacy","optional":true},{"reason":"Enables Stanford NER detectors for advanced entity recognition.","package":"scrubadub-stanford","optional":true},{"reason":"Enables address and postal code detection.","package":"scrubadub-address","optional":true}],"imports":[{"note":"The primary convenience function for quick cleaning.","symbol":"clean","correct":"import scrubadub\ncleaned_text = scrubadub.clean(text)"},{"note":"Use the Scrubber class for more fine-grained control, adding/removing specific detectors, or configuring post-processors.","symbol":"Scrubber","correct":"from scrubadub import Scrubber\nscrubber = Scrubber()\ncleaned_text = scrubber.clean(text)"},{"note":"Import specific detector classes to add them to a Scrubber, especially for optional or external detectors.","symbol":"Detector","correct":"from scrubadub.detectors import EmailDetector\nscrubber = Scrubber()\nscrubber.add_detector(EmailDetector())"}],"quickstart":{"code":"import scrubadub\n\ntext = \"My cat can be contacted on example@example.com, or 1800 555-5555. His name is John Doe.\"\ncleaned_text = scrubadub.clean(text)\nprint(cleaned_text)\n\n# For more control, use the Scrubber class\nfrom scrubadub import Scrubber\nfrom scrubadub.detectors import TextBlobNameDetector # Example of an optional detector\n\nscrubber = Scrubber()\n# Add a detector if it's not enabled by default or for custom configuration\nscrubber.add_detector(TextBlobNameDetector())\ncontrolled_cleaned_text = scrubber.clean(text)\nprint(controlled_cleaned_text)","lang":"python","description":"This quickstart demonstrates the basic usage of `scrubadub.clean()` for straightforward PII redaction. It also illustrates how to use the `Scrubber` class to manually add detectors for more customized control over the scrubbing process, especially useful for optional or external detectors not loaded by default."},"warnings":[{"fix":"Review the changelog for v2.0.0. For previously implicitly available detectors, explicitly install the relevant sub-package (e.g., `pip install scrubadub-spacy`) and add the detector to your `Scrubber` instance using `scrubber.add_detector(DetectorClass())`.","message":"Version 2.0.0 introduced significant changes, including the splitting of the library into smaller sub-packages and a shift from loading all detectors by default to loading only a default set. Code relying on previously auto-loaded detectors (e.g., spaCy, Stanford NER) will need explicit `add_detector()` calls or installation of optional packages (`scrubadub_spacy`, `scrubadub_stanford`, `scrubadub_address`).","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Upgrade to Python 3.6+ or pin `scrubadub` to `==1.2.2` for older Python versions.","message":"Python 2.7 and 3.5 support was dropped starting from version 2.0.0. If you require these Python versions, you must use `scrubadub` version 1.2.2 or earlier.","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Install the required optional packages (e.g., `pip install scrubadub-spacy`) and then use `scrubber.add_detector(DetectorClass())` to enable them.","message":"Only a default set of detectors are loaded when initializing a `Scrubber` or using `scrubadub.clean()` since version 2.0.0. If you need functionality from optional or external detectors (e.g., `SpacyNameDetector`, `AddressDetector`), you must explicitly install their packages and add them to your `Scrubber` instance.","severity":"gotcha","affected_versions":">=2.0.0"},{"fix":"Ensure each detector you add has a unique name. If adding multiple instances of the same detector class with different configurations, assign a unique `name` parameter during instantiation (e.g., `EmailDetector(name='work_email_detector')`).","message":"Attempting to add two detectors with the same name to a `Scrubber` instance will result in a `KeyError`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure you are using `scrubadub==2.0.1` or later to avoid this specific dependency naming issue.","message":"Version 2.0.1 fixed an issue where the `scikit-learn` dependency was incorrectly named. Users might have encountered installation problems with `scrubadub==2.0.0` due to this.","severity":"gotcha","affected_versions":"2.0.0"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[]}