Agent Healthcheck & Liveness Probes
Defining operational 'Vital Signs' to detect and restart hung or drifting agents.
Steps
- Implement a `/health` endpoint that checks for LLM API heartbeat and memory availability.
- Set a `Liveness Probe` to detect 'Infinite Loop' hangs (no response for >60s).
- Configure a `Readiness Probe` to ensure vector databases are indexed and reachable.
- Monitor 'Token Velocity': Restart the agent if it exceeds 10k tokens in <10 seconds (loop detection).
- Define a 'Graceful Shutdown' period to allow agents to finalize current tool-commits.