Agent Untrusted-Input Quarantine
Isolate and inspect raw user or external data before the agent processes it for decision making.
Steps
- Wrap all incoming external data in 'UNTRUSTED' metadata tags.
- Route raw input through a secondary 'Security Monitor' model for intent analysis.
- Strip executable code snippets or markdown formatting from raw text inputs.
- Identify and block known 'Jailbreak' strings or system-override keywords.
- Enforce strict character limits on any single un-inspected input block.
- Escalate flagged inputs to a 'Quarantine Log' for human security review.