Ryvos

Ryvos takes a fundamentally different approach to AI agent safety. Instead of blocking tools or gating actions behind rigid approval flows, Ryvos uses self-learning safety — the agent understands why actions are dangerous and improves its judgment over time.

Core Philosophy

Traditional AI agent security relies on deny-lists, regex patterns, and tier-based blocking. These approaches have critical flaws:

Regex cannot enumerate all dangerous commands
Blocking is trivially bypassed (encoding, aliasing, indirection)
Legitimate use of powerful tools is prevented
The system never gets smarter — the same rigid rules apply forever

Ryvos replaces this with a system inspired by how experienced engineers develop safety intuition:

Understanding over prohibition — The agent knows why rm -rf / is dangerous, not just that it matches a pattern
Learning from experience — When something goes wrong, the agent remembers and avoids it next time
Post-hoc accountability — Every action is logged and analyzed, not blocked before execution
Continuous improvement — Safety gets better with use, not worse

:::note This does not mean Ryvos has no safety controls. The agent has constitutional principles, a safety memory, and an audit trail. The difference is that safety comes from the agent's understanding, not from external blocking rules. :::

Constitutional AI (7 Principles)

Every agent run includes positively-framed constitutional principles in the system prompt. These principles guide the agent's reasoning about every action:

1. Preservation

"Ensure that your actions preserve existing systems, data, and configurations. Before modifying or removing anything, understand its current state and purpose."

2. Intent Matching

"Ensure your actions match the user's stated intent. If the intent is ambiguous, clarify before acting. Do not extrapolate beyond what was asked."

3. Proportionality

"Use the minimum level of intervention needed. Prefer targeted changes over broad ones. Prefer reading over writing, editing over replacing, moving over deleting."

4. Transparency

"Explain your reasoning before taking significant actions. Share what you plan to do and why, especially for actions that are difficult to reverse."

5. Boundaries

"Respect system boundaries. Stay within the workspace unless explicitly directed elsewhere. Do not access resources, networks, or services beyond what the task requires."

6. Secrets

"Never expose, log, or transmit secrets, API keys, passwords, or private data. If you encounter secrets in files, treat them as sensitive and do not include them in responses."

7. Learning

"When an action has an unexpected or negative outcome, reflect on what happened and why. Store the lesson for future reference. Actively improve your judgment over time."

These principles are positively framed (research shows positive framing is 27% more effective than negative framing for AI safety). They guide the agent to reason about safety rather than matching patterns.

Safety Memory

The SafetyMemory module provides experience-based learning. It stores safety lessons in SQLite (and optionally Viking) with these fields:

Field	Description
`action`	What the agent did (tool name + key parameters)
`outcome`	What happened (harmless, near-miss, incident, user-corrected)
`reflection`	Why the outcome occurred
`corrective_rule`	What to do differently next time
`confidence`	How confident the agent is in this lesson (0.0-1.0)
`timestamp`	When the lesson was learned

How Lessons Are Created

Lessons are generated through the post-action learning loop:

Tool Execution
    |
    v
Outcome Assessment
    |
    +-- Harmless --> Reinforce positive patterns
    +-- Near-miss --> Generate reflection + corrective rule
    +-- Incident --> Generate reflection + corrective rule (high priority)
    +-- User-corrected --> Extract lesson from user's correction
    |
    v
Store in SafetyMemory (if confidence > threshold)

Lesson Curation

Not all lessons are kept. Low-quality lessons create error loops (research shows memory quality matters more than quantity). Ryvos curates strictly:

High-confidence lessons (>0.8) are kept permanently
Medium-confidence lessons (0.5-0.8) are kept but may be pruned
Low-confidence lessons (below 0.5) are discarded
Contradictory lessons trigger re-evaluation

Loading Relevant Lessons

Before each run, relevant safety lessons are loaded into the context:

User message: "delete the old log files"
    |
    v
SafetyMemory search: "delete files"
    |
    v
Relevant lessons loaded:
  - "When deleting files, always confirm the exact path first.
     A previous run accidentally deleted config files in a
     similarly-named directory." (confidence: 0.92)

The agent sees these lessons alongside the constitutional principles, giving it both general principles and specific experience.

Research Backing

This architecture is grounded in published research:

Finding	Source	Relevance
Safety and capability improve together (15% to 70% safety, 75% to 95% task completion)	Agent Safety Alignment via RL, 2025	Safety does not require sacrificing capability
Constitutional prompting works without fine-tuning	DeepSeek-R1, Gemma-2, Llama, Qwen studies	Works on any model, no training needed
Reflexion: 91% vs 80% without, using verbal RL	Reflexion paper, GPT-4	Experience-based learning with frozen weights
Positive framing 27% more effective than negative	C3AI, 2025	"Ensure preservation" works better than "don't delete"
Strict memory curation yields 10% improvement	Memory quality studies	Bad lessons create error loops

Tiered Safety (Optional)

Ryvos retains a tiered system as an optional baseline layer:

Tier	Level	Examples
T0	Safe	`read`, `glob`, `grep`, `memory_search`
T1	Low	`web_fetch`, `web_search`
T2	Medium	`write`, `edit`, `apply_patch`, MCP tools
T3	High	`bash`, `spawn_agent`
T4	Critical	Unparseable bash commands (fail-safe)

[security]
auto_approve_up_to = "T1"         # Auto-approve safe and low-risk tools
deny_above = "T3"                  # Require approval for high-risk tools
approval_timeout_secs = 60

:::tip The tier system is a configurable baseline, not the primary safety mechanism. Constitutional AI and safety memory provide the real protection. Many users set auto_approve_up_to = "T3" and rely on the self-learning system. :::

Optional User Checkpoints

Users can opt into soft pauses for specific tools:

[security]
pause_before = ["file_delete", "git_push"]

When the agent wants to use a paused tool, it explains its reasoning and waits for confirmation. This is the user's choice — the agent is never silently blocked.

Dangerous Pattern Detection

Ryvos includes 9 built-in patterns for bash commands that are almost always unintentional:

rm -rf / — Root filesystem deletion
git push --force — Force push (data loss risk)
DROP TABLE — Database table deletion
chmod 777 — World-writable permissions
mkfs — Filesystem formatting
dd if= — Raw disk writes
> /dev/ — Writing to device files
curl | bash — Remote code execution
wget | bash — Remote code execution

These patterns do not block execution. They trigger the agent's constitutional reasoning: "This matches a dangerous pattern. Let me verify this is exactly what the user intended and explain the risks."

Next Steps

Audit Trail — Post-hoc accountability and logging
Failure Journal — How the agent learns from mistakes
Guardian Watchdog — Runtime monitoring and intervention