Presidio Analyzer

2.2.362 · active · verified Thu Apr 09

Presidio Analyzer is a Python library and service for detecting Personally Identifiable Information (PII) entities in text. It leverages a combination of predefined recognizers, regular expressions, and Named Entity Recognition (NER) models to identify sensitive data. The library is actively maintained, with a current version of 2.2.362, and releases frequently to add new features, fix bugs, and improve detection capabilities.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the AnalyzerEngine and detect PII entities in a given text. Ensure you have downloaded the `en_core_web_lg` spaCy model as instructed in the installation steps, as it's used by default for NLP capabilities. The output will show detected entities, their location, and a confidence score.

from presidio_analyzer import AnalyzerEngine

# Initialize the AnalyzerEngine
# This will load the default spaCy NLP model (en_core_web_lg if downloaded)
analyzer = AnalyzerEngine()

text = "My name is John Doe and my phone number is (123) 456-7890."

# Analyze the text for PII entities
# Specify entities to look for, or leave empty for all supported entities
results = analyzer.analyze(text=text, entities=["PERSON", "PHONE_NUMBER"], language='en')

for result in results:
    print(f"Entity: {result.entity_type}, Text: {text[result.start:result.end]}, Score: {result.score:.2f}")

view raw JSON →