{"id":2197,"library":"presidio-analyzer","title":"Presidio Analyzer","description":"Presidio Analyzer is a Python library and service for detecting Personally Identifiable Information (PII) entities in text. It leverages a combination of predefined recognizers, regular expressions, and Named Entity Recognition (NER) models to identify sensitive data. The library is actively maintained, with a current version of 2.2.362, and releases frequently to add new features, fix bugs, and improve detection capabilities.","status":"active","version":"2.2.362","language":"en","source_language":"en","source_url":"https://github.com/Microsoft/presidio","tags":["PII","NLP","privacy","data-masking","security"],"install":[{"cmd":"pip install presidio-analyzer","lang":"bash","label":"Install core library"},{"cmd":"python -m spacy download en_core_web_lg","lang":"bash","label":"Download default spaCy NLP model"}],"dependencies":[{"reason":"Required for the default NLP engine used by AnalyzerEngine for advanced PII detection.","package":"spacy","optional":false},{"reason":"Optional extra for using Hugging Face Transformers models as an NLP engine.","package":"transformers","optional":true},{"reason":"Optional extra for using Stanza NLP models as an NLP engine.","package":"stanza","optional":true}],"imports":[{"symbol":"AnalyzerEngine","correct":"from presidio_analyzer import AnalyzerEngine"},{"symbol":"RecognizerRegistry","correct":"from presidio_analyzer.recognizer_registry import RecognizerRegistry"},{"symbol":"NlpEngineProvider","correct":"from presidio_analyzer.nlp_engine import NlpEngineProvider"}],"quickstart":{"code":"from presidio_analyzer import AnalyzerEngine\n\n# Initialize the AnalyzerEngine\n# This will load the default spaCy NLP model (en_core_web_lg if downloaded)\nanalyzer = AnalyzerEngine()\n\ntext = \"My name is John Doe and my phone number is (123) 456-7890.\"\n\n# Analyze the text for PII entities\n# Specify entities to look for, or leave empty for all supported entities\nresults = analyzer.analyze(text=text, entities=[\"PERSON\", \"PHONE_NUMBER\"], language='en')\n\nfor result in results:\n    print(f\"Entity: {result.entity_type}, Text: {text[result.start:result.end]}, Score: {result.score:.2f}\")","lang":"python","description":"This quickstart demonstrates how to initialize the AnalyzerEngine and detect PII entities in a given text. Ensure you have downloaded the `en_core_web_lg` spaCy model as instructed in the installation steps, as it's used by default for NLP capabilities. The output will show detected entities, their location, and a confidence score."},"warnings":[{"fix":"Refer to the 'Changes from V1 to V2' documentation on the Presidio website. Rewrite API calls and update data structures to conform to the new HTTP/JSON format.","message":"Presidio underwent a significant revamp from V1 to V2 (starting around 2.0.0). This involved a migration from gRPC to HTTP-based APIs, changes in JSON payload formats (structured objects to flattened JSON, camelCase to snake_case), and deprecation of some services. Code written for V1 is not compatible with V2 APIs.","severity":"breaking","affected_versions":"All versions >= 2.0.0 (breaking from V1)"},{"fix":"Review the documentation for 'Recognizer registry from file' or programmatic `RecognizerRegistry` customization to enable specific country-specific recognizers.","message":"Many country-specific PII recognizers (e.g., for Singapore, Australia, Germany, Sweden) are disabled by default to prevent false positives when they are not explicitly needed. If you require detection for specific regional PII, you must explicitly enable these recognizers either via a YAML configuration file or programmatically by adding them to the RecognizerRegistry.","severity":"gotcha","affected_versions":"2.2.359 and later"},{"fix":"Always run `python -m spacy download <model_name>` for the languages you intend to analyze, as part of your environment setup.","message":"The AnalyzerEngine relies on NLP models (like spaCy) for many detections. While `spacy` itself is a dependency, the language models (e.g., `en_core_web_lg`) must be downloaded separately using `python -m spacy download <model_name>`. Failure to do so will result in errors or limited detection capabilities.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always check the `presidio-analyzer` release notes and `pyproject.toml` for explicit dependency version constraints and ensure your environment adheres to them.","message":"Specific versions of underlying NLP libraries, particularly spaCy, might be explicitly restricted in certain `presidio-analyzer` releases. For example, `spacy.cli` was restricted for version 3.7.0 in release 2.2.356. Using an incompatible spaCy version can lead to unexpected behavior or errors.","severity":"gotcha","affected_versions":"Specific patch versions (e.g., 2.2.356 for spaCy 3.7.0)"},{"fix":"Monitor GitHub issues for type-related fixes. Depending on the error, temporary workarounds might include `type: ignore` comments or casting until official fixes are released.","message":"Users employing static type checking (e.g., mypy) may encounter type errors in versions after 2.2.33, specifically related to the initialization of `AnonymizerEngine` and type mismatches for `RecognizerResult` between `presidio-analyzer` and `presidio-anonymizer` due to separate class definitions.","severity":"gotcha","affected_versions":"2.2.354 and later (potentially from 2.2.33)"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}