Ryvos takes a fundamentally different approach to AI agent safety. Instead of blocking tools or gating actions behind rigid approval flows, Ryvos uses self-learning safety — the agent understands why actions are dangerous and improves its judgment over time.
Core Philosophy
Traditional AI agent security relies on deny-lists, regex patterns, and tier-based blocking. These approaches have critical flaws:
- Regex cannot enumerate all dangerous commands
- Blocking is trivially bypassed (encoding, aliasing, indirection)
- Legitimate use of powerful tools is prevented
- The system never gets smarter — the same rigid rules apply forever
Ryvos replaces this with a system inspired by how experienced engineers develop safety intuition:
- Understanding over prohibition — The agent knows why
rm -rf /is dangerous, not just that it matches a pattern - Learning from experience — When something goes wrong, the agent remembers and avoids it next time
- Post-hoc accountability — Every action is logged and analyzed, not blocked before execution
- Continuous improvement — Safety gets better with use, not worse
:::note This does not mean Ryvos has no safety controls. The agent has constitutional principles, a safety memory, and an audit trail. The difference is that safety comes from the agent's understanding, not from external blocking rules. :::
Constitutional AI (7 Principles)
Every agent run includes positively-framed constitutional principles in the system prompt. These principles guide the agent's reasoning about every action:
1. Preservation
"Ensure that your actions preserve existing systems, data, and configurations. Before modifying or removing anything, understand its current state and purpose."
2. Intent Matching
"Ensure your actions match the user's stated intent. If the intent is ambiguous, clarify before acting. Do not extrapolate beyond what was asked."
3. Proportionality
"Use the minimum level of intervention needed. Prefer targeted changes over broad ones. Prefer reading over writing, editing over replacing, moving over deleting."
4. Transparency
"Explain your reasoning before taking significant actions. Share what you plan to do and why, especially for actions that are difficult to reverse."
5. Boundaries
"Respect system boundaries. Stay within the workspace unless explicitly directed elsewhere. Do not access resources, networks, or services beyond what the task requires."
6. Secrets
"Never expose, log, or transmit secrets, API keys, passwords, or private data. If you encounter secrets in files, treat them as sensitive and do not include them in responses."
7. Learning
"When an action has an unexpected or negative outcome, reflect on what happened and why. Store the lesson for future reference. Actively improve your judgment over time."
These principles are positively framed (research shows positive framing is 27% more effective than negative framing for AI safety). They guide the agent to reason about safety rather than matching patterns.
Safety Memory
The SafetyMemory module provides experience-based learning. It stores safety lessons in SQLite (and optionally Viking) with these fields:
| Field | Description |
|---|---|
action | What the agent did (tool name + key parameters) |
outcome | What happened (harmless, near-miss, incident, user-corrected) |
reflection | Why the outcome occurred |
corrective_rule | What to do differently next time |
confidence | How confident the agent is in this lesson (0.0-1.0) |
timestamp | When the lesson was learned |
How Lessons Are Created
Lessons are generated through the post-action learning loop:
Tool Execution
|
v
Outcome Assessment
|
+-- Harmless --> Reinforce positive patterns
+-- Near-miss --> Generate reflection + corrective rule
+-- Incident --> Generate reflection + corrective rule (high priority)
+-- User-corrected --> Extract lesson from user's correction
|
v
Store in SafetyMemory (if confidence > threshold)
Lesson Curation
Not all lessons are kept. Low-quality lessons create error loops (research shows memory quality matters more than quantity). Ryvos curates strictly:
- High-confidence lessons (>0.8) are kept permanently
- Medium-confidence lessons (0.5-0.8) are kept but may be pruned
- Low-confidence lessons (below 0.5) are discarded
- Contradictory lessons trigger re-evaluation
Loading Relevant Lessons
Before each run, relevant safety lessons are loaded into the context:
User message: "delete the old log files"
|
v
SafetyMemory search: "delete files"
|
v
Relevant lessons loaded:
- "When deleting files, always confirm the exact path first.
A previous run accidentally deleted config files in a
similarly-named directory." (confidence: 0.92)
The agent sees these lessons alongside the constitutional principles, giving it both general principles and specific experience.
Research Backing
This architecture is grounded in published research:
| Finding | Source | Relevance |
|---|---|---|
| Safety and capability improve together (15% to 70% safety, 75% to 95% task completion) | Agent Safety Alignment via RL, 2025 | Safety does not require sacrificing capability |
| Constitutional prompting works without fine-tuning | DeepSeek-R1, Gemma-2, Llama, Qwen studies | Works on any model, no training needed |
| Reflexion: 91% vs 80% without, using verbal RL | Reflexion paper, GPT-4 | Experience-based learning with frozen weights |
| Positive framing 27% more effective than negative | C3AI, 2025 | "Ensure preservation" works better than "don't delete" |
| Strict memory curation yields 10% improvement | Memory quality studies | Bad lessons create error loops |
Tiered Safety (Optional)
Ryvos retains a tiered system as an optional baseline layer:
| Tier | Level | Examples |
|---|---|---|
| T0 | Safe | read, glob, grep, memory_search |
| T1 | Low | web_fetch, web_search |
| T2 | Medium | write, edit, apply_patch, MCP tools |
| T3 | High | bash, spawn_agent |
| T4 | Critical | Unparseable bash commands (fail-safe) |
[security]
auto_approve_up_to = "T1" # Auto-approve safe and low-risk tools
deny_above = "T3" # Require approval for high-risk tools
approval_timeout_secs = 60:::tip
The tier system is a configurable baseline, not the primary safety mechanism. Constitutional AI and safety memory provide the real protection. Many users set auto_approve_up_to = "T3" and rely on the self-learning system.
:::
Optional User Checkpoints
Users can opt into soft pauses for specific tools:
[security]
pause_before = ["file_delete", "git_push"]When the agent wants to use a paused tool, it explains its reasoning and waits for confirmation. This is the user's choice — the agent is never silently blocked.
Dangerous Pattern Detection
Ryvos includes 9 built-in patterns for bash commands that are almost always unintentional:
rm -rf /— Root filesystem deletiongit push --force— Force push (data loss risk)DROP TABLE— Database table deletionchmod 777— World-writable permissionsmkfs— Filesystem formattingdd if=— Raw disk writes> /dev/— Writing to device filescurl | bash— Remote code executionwget | bash— Remote code execution
These patterns do not block execution. They trigger the agent's constitutional reasoning: "This matches a dangerous pattern. Let me verify this is exactly what the user intended and explain the risks."
Next Steps
- Audit Trail — Post-hoc accountability and logging
- Failure Journal — How the agent learns from mistakes
- Guardian Watchdog — Runtime monitoring and intervention