Presidio Anonymizer
Presidio Anonymizer is a Python-based module designed for anonymizing detected Personally Identifiable Information (PII) entities in text. It offers a range of built-in operators (e.g., replace, mask, redact, hash, encrypt) and supports custom anonymization logic. It also includes deanonymization capabilities for reversible operations like decryption. The library is actively maintained by Microsoft, with frequent releases, and is currently at version 2.2.362.
Warnings
- breaking The default behavior of the 'hash' operator changed in version 2.2.361. It now uses a random salt by default for enhanced security, which means the same PII value will yield different hashes across calls or entities. This breaks referential integrity unless a salt is explicitly provided.
- gotcha When using `presidio-analyzer` and `presidio-anonymizer` together in a typed Python environment, `mypy` might report type errors due to `RecognizerResult` existing in both packages with incompatible types. The `AnonymizerEngine` expects `RecognizerResult` from `presidio_anonymizer.entities`.
- gotcha Starting from version 2.2.359, many country-specific recognizers (e.g., SgFinRecognizer, AuAbnRecognizer) are disabled by default to prevent false positives when they are not explicitly needed. Users expecting these to work out-of-the-box might find them inactive.
- gotcha In versions prior to 2.2.362, `AnonymizerEngine` could fail to anonymize all instances of an entity if multiple identical entities were separated only by spaces, potentially leading to PII leakage (e.g., 'email1@example.com email2@example.com'). A fix was implemented in 2.2.362.
- gotcha While `presidio-anonymizer` is a standalone package, a complete PII detection and anonymization pipeline typically requires `presidio-analyzer` for detection and an underlying NLP engine (like spaCy with a language model such as `en_core_web_lg`). Not installing these dependencies will prevent the full workflow from functioning.
Install
-
pip install presidio-anonymizer -
pip install presidio-analyzer "spacy[en]" python -m spacy download en_core_web_lg
Imports
- AnonymizerEngine
from presidio_anonymizer import AnonymizerEngine
- RecognizerResult
from presidio_anonymizer.entities import RecognizerResult
- OperatorConfig
from presidio_anonymizer.entities import OperatorConfig
Quickstart
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import RecognizerResult, OperatorConfig
# Sample text and mock analyzer results (typically from presidio-analyzer)
text = "My name is John Doe and my phone number is 123-456-7890."
analyzer_results = [
RecognizerResult(entity_type="PERSON", start=11, end=19, score=0.9),
RecognizerResult(entity_type="PHONE_NUMBER", start=38, end=50, score=0.8),
]
# Initialize the anonymizer engine
anonymizer = AnonymizerEngine()
# Define anonymization operators
# Here, PERSON will be replaced with "<PERSON>", and PHONE_NUMBER will be masked
operators = {
"PERSON": OperatorConfig("replace", {"new_value": "<PERSON>"}),
"PHONE_NUMBER": OperatorConfig("mask", {
"masking_char": "*",
"chars_to_mask": 10,
"from_end": True
})
}
# Perform anonymization
anonymized_result = anonymizer.anonymize(
text=text,
analyzer_results=analyzer_results,
operators=operators
)
print(f"Original text: {text}")
print(f"Anonymized text: {anonymized_result.text}")