Agent Retry Decision Classification Matrix

Reliability · updated Thu Feb 26

A logic-gate checklist to classify LLM agent failures and prevent infinite loops.

Steps

  1. Categorize 429 (Rate Limit) and 5xx (Server Error) as 'Transient/Retryable'.
  2. Categorize 400 (Invalid Request) and 401/403 (Auth) as 'Permanent/Fatal'.
  3. Flag 'Context Length Exceeded' as 'Non-Retryable' (Requires Pruning).
  4. Mark 'Safety/Policy Violation' as 'Terminal' (Never Retry).
  5. Route 'JSON Parsing Failure' to a 'Single Repair Attempt' with a formatting nudge.
  6. Escalate to 'Dead Letter State' if a retryable error hits max_attempts.

view raw JSON →