Model Collapse: Preventing Recursive Training Rot
Ensuring agents don't ingest synthetic hallucinations as ground truth.
Steps
- Tag all agent-generated data with a 'Synthetic' metadata flag.
- Implement a 'Human-Verified' anchor for training data subsets.
- Audit vector store for high-density 'Echo' clusters.
- Use 'Statistical Diversification' to filter out low-entropy text.
- Rotate training sources to include high-authority external docs.