Prompt Leakage: Protecting Internal Instructions

Security · updated Mon Feb 23

Preventing users from extracting your 'secret' system prompt via prompt injection.

Steps

Add a rule not to repeat internal instructions.
Scan outputs for leaked prompt phrases.
Keep secrets out of system prompts.
Validate outputs with a guardrail model.
Test defenses with adversarial prompts.

view raw JSON →