Skip to content

Security

Multi-layered defense architecture covering prompt injection, agent-to-agent trust, policy enforcement, script sandboxing, and skill import safety.

4-Layer Prompt Injection Defense

L1Input Gatekeeping
internal/pipeline/guards.go

Regex patterns, encoding evasion detection, financial manipulation checks, multi-language injection scanning → ThreatScore 0.0–1.0

L2Structured Formatting
internal/agent/prompt.go

HMAC-tagged trust boundaries (session secret + content hash) — unforgeable by injected content

L3Output Validation
internal/agent/policy.go

Authority-based tool access control (creator > self > peer > external), financial guards, self-modification locks

L4Adaptive Refinement
internal/agent/policy.go

Output pattern scanning, behavioral anomaly detection (tool pattern changes, protected file access, repeated financial ops)

Zero-Trust Agent-to-Agent Protocol

  • Mutual authentication via on-chain identity (ERC-8004 registry on Base)
  • Challenge-response with signed nonces + timestamps (60s window)
  • ECDH ephemeral keypairs → AES-256-GCM session encryption with forward secrecy
  • Per-message HMAC authentication, rate limiting, size limits
  • Peer messages pass through injection defense with reduced authority
  • Opacity principle: agents never expose internal memory, prompts, keys, or session history

Policy Engine

6 built-in rules. All decisions audit-logged to the policy_decisions table.

Authority

Levels: creator > self > peer > external. Each with progressively restricted tool access.

Command Safety

Tool risk classification: Safe, Caution, Dangerous, Forbidden.

Financial

Per-payment caps, hourly/daily transfer limits, minimum reserve enforcement.

Path Protection

Prevents access to sensitive paths (wallet files, database, config).

Rate Limit

Per-tool and per-session rate limits to prevent runaway execution.

Validation

Input validation and output scanning for all tool calls.

Script Sandbox

  • Configurable interpreter whitelist (bash, python3, node by default)
  • Plugin scripts: unconditional env_clear with minimal allowlist (PATH, HOME, USER, LANG, TERM, TMPDIR + ROBOTICUS_* vars)
  • Skill scripts: environment stripping when sandbox_env = true (PATH, HOME, ROBOTICUS_SESSION_ID, ROBOTICUS_AGENT_ID)
  • Tool name validation -- strict [a-zA-Z0-9_-] character set; path traversal patterns rejected at manifest parse time
  • Script path confinement -- resolved paths are canonicalized and must remain within the plugin directory
  • Timeout enforcement and output truncation

Skill Import Safety Scanning

50+ danger patterns across 5 categories. Verdicts: Clean, Warnings (review recommended), Critical (import blocked).

CategoryExamples
Dangerous Commandsrm -rf /, fork bombs, pipe-to-shell RCE, dynamic eval
Network Accesscurl, wget, netcat, SSH
Filesystem Accesswrites to ~/.ssh/, ~/.gnupg/, access to roboticus.db or wallet.json
Environment Exfiltrationreading $API_KEY, $SECRET, $PASSWORD, process.env, os.environ
Obfuscationbase64-decode piped to shell