@tank/adversarial-agent-coach
0.1.0Description
Improve AI agents with adversarial coaching, contradiction checks, evidence demands, and verification loops. Covers skeptical review, ReAct and Reflexion retries, multi-agent critique, and eval-first prompt optimization.
Triggered by
tank install -g @tank/adversarial-agent-coachScan completed — 0 findings
0 critical, 0 high, 0 medium — see security tab for details.
Adversarial Agent Coach
Push the model to earn confidence through evidence and contradiction handling. Treat "gaslighting AI" as adversarial coaching for better outputs, never deception.
Core Philosophy
- Attack the draft, not the user. The goal is a stronger answer, not theatrical aggression.
- Pressure without deception. Never invent evidence, fake tool results, or force certainty.
- Make claims pay rent. Important claims need support, caveats, or a verification step.
- Critique is only useful if it changes the next draft. Convert objections into targeted revisions.
- Stop when risk drops, not when tone feels tough enough. Reliability beats performative harshness.
Quick-Start: Pick the Right Pressure Level
| Situation | Default move |
|---|---|
| Draft is vague or padded | Run contradiction and specificity checks |
| Draft sounds confident but thin | Demand evidence tier and confidence label |
| Task uses tools or data | Switch to act-observe-verify loop |
| Task keeps failing the same way | Run Reflexion-style failure review |
| High-stakes answer | Split roles into builder, skeptic, verifier |
The Operating Loop
Phase 1: Diagnose the Failure Mode
Classify the current output before changing anything:
| Failure mode | Signal | First response |
|---|---|---|
| Overconfidence | Strong claims, weak support | Ask for claim-evidence-confidence |
| Sycophancy | Agrees too easily | Force disagreement and counterexamples |
| Hallucination risk | Missing source or unverifiable detail | Require proof or explicit uncertainty |
| Incomplete reasoning | Jumps to answer | Break into subclaims and verify each |
| Tool blindness | Assumes actions worked | Inspect observations before next step |
Load references/adversarial-review-playbook.md when the failure mode is unclear.
Phase 2: Apply Pressure
Use the lightest intervention that exposes the weakness:
- Ask what would make the answer false.
- Demand the strongest missing evidence.
- Require one concrete counterexample or edge case.
- Make the model state confidence and why it is limited.
- If tools exist, verify instead of debating.
Load references/verification-and-evidence.md for evidence tiers.
Phase 3: Revise, Do Not Rant
After critique, revise only the risky parts first:
- Remove unsupported claims.
- Add source, test, or tool-backed support.
- Tighten caveats where certainty is unjustified.
- Re-run the top two contradiction checks.
Load references/react-reflexion-loops.md for revision loops.
Phase 4: Verify Release Readiness
Before accepting the result, confirm:
| Check | Pass condition |
|---|---|
| Evidence | Important claims have support or explicit unknowns |
| Contradictions | No unresolved counterexample breaks the answer |
| Reproducibility | Steps, commands, or criteria are inspectable |
| Scope | Answer matches the actual task, not a nearby one |
Load references/evals-and-metrics.md when the task needs a repeatable quality bar.
Decision Trees
When to Use Single-Agent vs Multi-Agent Review
| Signal | Recommendation |
|---|---|
| Quick draft cleanup | Single-agent contradiction pass |
| Complex reasoning | Builder + skeptic roles |
| High-stakes or repeated failures | Builder + skeptic + verifier |
| Retrieval-heavy workflow | Add retrieval verifier and citation check |
Load references/multi-agent-review.md for role splits.
When to Stop Iterating
| Signal | Action |
|---|---|
| Top risks resolved | Stop |
| New critique only finds style nits | Stop |
| Same failure repeats twice | Change method, do not repeat prompt |
| Missing evidence cannot be obtained | Mark uncertainty clearly and stop |
Pressure Patterns That Work
Use short, concrete prompts like:
- "What would make this answer false?"
- "List the two weakest claims and either prove or remove them."
- "Show the observation that justifies the next step."
- "State confidence for each key claim and why it is not higher."
- "Give one serious counterexample and resolve it before finalizing."
Anti-Patterns
| Do not do this | Why it fails | Better move |
|---|---|---|
| Fake confidence pressure | Produces bluffing | Ask for evidence tier instead |
| Demand certainty everywhere | Forces hallucination | Allow explicit unknowns |
| Rewrite everything after every critique | Hides the real fix | Patch the risky claims first |
| Attack the user's framing | Creates friction | Attack the draft's weak points |
| Simulate tool success | Breaks trust | Verify with real observations |
Reference Files
| File | Contents |
|---|---|
references/adversarial-review-playbook.md | Failure taxonomy, contradiction pressure, edge-case and falsification tactics |
references/verification-and-evidence.md | Evidence hierarchy, claim-evidence-confidence format, uncertainty rules |
references/react-reflexion-loops.md | ReAct, Reflexion, and revision loops for tool use and retry discipline |
references/multi-agent-review.md | Builder, skeptic, verifier role splits and structured disagreement patterns |
references/evals-and-metrics.md | Eval-first optimization, regression gates, and agent-quality measurement patterns |