DocsSelf-HealingGuardian Watchdog

The Guardian is Ryvos's runtime watchdog. It monitors every agent run for problems and intervenes with corrective hints before things go wrong. It does not block execution — it guides the agent back on track.

What the Guardian Monitors

┌─────────────────────────────────────────────┐
│               Guardian Watchdog              │
│                                              │
│  ┌─────────────────┐  ┌──────────────────┐  │
│  │  Doom Loop       │  │  Stall           │  │
│  │  Detection       │  │  Detection       │  │
│  │                  │  │                  │  │
│  │  Identical tool  │  │  No progress     │  │
│  │  calls repeated  │  │  for N seconds   │  │
│  └─────────────────┘  └──────────────────┘  │
│                                              │
│  ┌─────────────────┐  ┌──────────────────┐  │
│  │  Token Budget    │  │  Dollar Budget   │  │
│  │  Monitoring      │  │  Monitoring      │  │
│  │                  │  │                  │  │
│  │  Soft warn at    │  │  Monthly and     │  │
│  │  80%, hard stop  │  │  per-run cost    │  │
│  │  at 95%          │  │  limits          │  │
│  └─────────────────┘  └──────────────────┘  │
└─────────────────────────────────────────────┘

Doom Loop Detection

A doom loop occurs when the agent calls the same tool with identical (or near-identical) arguments multiple times in a row. This usually means the agent is stuck trying the same failing approach repeatedly.

How It Works

The Guardian fingerprints each tool call using:

  • Tool name
  • Hash of the input JSON

If the last N calls match the same fingerprint (where N = doom_loop_threshold), the Guardian intervenes:

Turn 5: bash("npm test") → failed
Turn 6: bash("npm test") → failed
Turn 7: bash("npm test") → failed  ← doom loop detected!

Guardian hint injected:
"You've called 'bash' with the same command 3 consecutive times.
 Each attempt failed. Try a different approach: check the error
 output, fix the underlying issue, or use a different tool."

Configuration

[agent.guardian]
doom_loop_threshold = 3             # Consecutive identical calls before intervention

Events

EventMeaning
GuardianDoomLoopDoom loop detected, hint injected
GuardianHintCorrective hint injected into conversation

Stall Detection

A stall occurs when the agent makes no meaningful progress for an extended period. This can happen when:

  • The LLM is generating very long responses without tool calls
  • A tool call takes unexpectedly long
  • The agent is deliberating without acting

How It Works

The Guardian tracks the timestamp of the last meaningful action (tool call, message completion). If the interval exceeds stall_timeout_secs, it injects a hint:

Guardian hint:
"No progress detected for 120 seconds. Consider taking action
 or asking the user for clarification if you're stuck."

Configuration

[agent.guardian]
stall_timeout_secs = 120            # Seconds of inactivity before stall alert

Events

EventMeaning
GuardianStallNo progress detected, hint injected

Token Budget Monitoring

The Guardian tracks context token usage and warns before the limit is reached.

Soft Warning

At 80% of max_context_tokens, the Guardian emits a warning:

[BudgetWarning] Token usage at 82% (26,240/32,000).
Context compaction will begin soon.

This triggers the memory flush process — the agent extracts important facts to persistent memory before old messages are pruned.

Hard Stop

At 95% of max_context_tokens, the Guardian forces a stop:

[BudgetExceeded] Token budget exhausted (31,200/32,000).
Completing current turn and stopping.

Configuration

[agent.guardian]
token_budget_soft = 80              # Percentage: emit warning
token_budget_hard = 95              # Percentage: force stop
 
[agent]
max_context_tokens = 32000          # The total budget

Dollar Budget Monitoring

The Guardian tracks spending against configured budget limits.

Per-Run Budget

[budget]
per_run_limit_cents = 100           # $1.00 per run maximum

When the per-run cost approaches the limit:

  • At 80%: BudgetWarning event
  • At 100%: BudgetExceeded event, run stopped

Monthly Budget

[budget]
monthly_limit_cents = 5000          # $50.00 per month
warning_threshold_cents = 4000      # Warn at $40.00

Monthly costs are tracked in the CostStore (SQLite). When the monthly total approaches the limit:

  • At warning_threshold_cents: BudgetWarning event
  • At monthly_limit_cents: BudgetExceeded event, all runs blocked until next month

Events

EventMeaning
BudgetWarningApproaching a budget limit
BudgetExceededBudget limit reached, execution stopped

Intervention Strategy

The Guardian follows a non-blocking intervention strategy:

  1. Detect — Identify the problem (doom loop, stall, budget)
  2. Hint — Inject a corrective message into the conversation
  3. Observe — Check if the agent adjusts its behavior
  4. Escalate — If the problem persists after the hint, apply harder interventions:
    • Additional hints with stronger guidance
    • Force-completing the current run with a summary
    • Reporting the issue to the user via channels

The Guardian never silently blocks a tool call. It provides guidance and lets the agent make decisions.

Guardian Events Summary

EventSeverityTrigger
GuardianHintInfoAny corrective hint injected
GuardianDoomLoopWarningConsecutive identical tool calls
GuardianStallWarningNo progress for stall_timeout_secs
GuardianBudgetAlertWarningApproaching budget limit
BudgetWarningWarningBudget soft threshold reached
BudgetExceededErrorBudget hard limit reached

All events are published on the EventBus and visible in:

  • The Web UI activity feed
  • Run logs (L2 and L3)
  • Channel notifications (if configured)

Full Configuration

[agent.guardian]
enabled = true                      # Enable the guardian watchdog
doom_loop_threshold = 3             # Identical calls before intervention
stall_timeout_secs = 120            # Seconds before stall detection
 
token_budget_soft = 80              # Percentage: warning threshold
token_budget_hard = 95              # Percentage: hard stop
 
[budget]
monthly_limit_cents = 5000          # Monthly spending cap
warning_threshold_cents = 4000      # Monthly warning threshold
per_run_limit_cents = 100           # Per-run spending cap
 
[budget.pricing_overrides]          # Custom pricing per model
"claude-sonnet-4-20250514" = { input_per_1m = 300, output_per_1m = 1500 }
"gpt-4o" = { input_per_1m = 250, output_per_1m = 1000 }

Next Steps