Agent Untrusted-Input Quarantine

Security · updated Thu Feb 26

Isolate and inspect raw user or external data before the agent processes it for decision making.

Steps

Wrap all incoming external data in 'UNTRUSTED' metadata tags.
Route raw input through a secondary 'Security Monitor' model for intent analysis.
Strip executable code snippets or markdown formatting from raw text inputs.
Identify and block known 'Jailbreak' strings or system-override keywords.
Enforce strict character limits on any single un-inspected input block.
Escalate flagged inputs to a 'Quarantine Log' for human security review.