Prompt Leakage: Protecting Internal Instructions
Preventing users from extracting your 'secret' system prompt via prompt injection.
Steps
- Add a rule not to repeat internal instructions.
- Scan outputs for leaked prompt phrases.
- Keep secrets out of system prompts.
- Validate outputs with a guardrail model.
- Test defenses with adversarial prompts.