Context Poisoning: RAG Injection Guardrails

Security · updated Mon Feb 23

Preventing 'Indirect Prompt Injection' via retrieved documents.

Steps

  1. Sanitize retrieved chunks for 'Ignore previous instructions' patterns.
  2. Isolate system instructions from RAG context using delimiters.
  3. Implement a 'Pre-Ingestion' LLM filter to flag instruction-like text.
  4. Use a 'Read-Only' persona for agents processing untrusted RAG data.
  5. Sign and verify the origin of all documents in the vector store.

view raw JSON →