Prompt Leakage: Protecting Internal Instructions

Security · updated Mon Feb 23

Preventing users from extracting your 'secret' system prompt via prompt injection.

Steps

  1. Add a rule not to repeat internal instructions.
  2. Scan outputs for leaked prompt phrases.
  3. Keep secrets out of system prompts.
  4. Validate outputs with a guardrail model.
  5. Test defenses with adversarial prompts.

view raw JSON →