Security

Multi-layered defense architecture covering prompt injection, agent-to-agent trust, policy enforcement, script sandboxing, and skill import safety.

4-Layer Prompt Injection Defense

L1Input Gatekeeping
ironclad-agent/injection.rs

Regex patterns, encoding evasion detection, financial manipulation checks, multi-language injection scanning → ThreatScore 0.0–1.0

L2Structured Formatting
ironclad-agent/prompt.rs

HMAC-tagged trust boundaries (session secret + content hash) — unforgeable by injected content

L3Output Validation
ironclad-agent/policy.rs

Authority-based tool access control (creator > self > peer > external), financial guards, self-modification locks

L4Adaptive Refinement
ironclad-agent/policy.rs

Output pattern scanning, behavioral anomaly detection (tool pattern changes, protected file access, repeated financial ops)

Zero-Trust Agent-to-Agent Protocol

  • Mutual authentication via on-chain identity (ERC-8004 registry on Base)
  • Challenge-response with signed nonces + timestamps (60s window)
  • ECDH ephemeral keypairs → AES-256-GCM session encryption with forward secrecy
  • Per-message HMAC authentication, rate limiting, size limits
  • Peer messages pass through injection defense with reduced authority
  • Opacity principle: agents never expose internal memory, prompts, keys, or session history

Policy Engine

6 built-in rules. All decisions audit-logged to the policy_decisions table.

Authority

Levels: creator > self > peer > external. Each with progressively restricted tool access.

Command Safety

Tool risk classification: Safe, Caution, Dangerous, Forbidden.

Financial

Per-payment caps, hourly/daily transfer limits, minimum reserve enforcement.

Path Protection

Prevents access to sensitive paths (wallet files, database, config).

Rate Limit

Per-tool and per-session rate limits to prevent runaway execution.

Validation

Input validation and output scanning for all tool calls.

Script Sandbox

  • Configurable interpreter whitelist (bash, python3, node by default)
  • Environment stripping in sandbox mode (only PATH, HOME, IRONCLAD_SESSION_ID, IRONCLAD_AGENT_ID)
  • Timeout enforcement and output truncation

Skill Import Safety Scanning

50+ danger patterns across 5 categories. Verdicts: Clean, Warnings (review recommended), Critical (import blocked).

CategoryExamples
Dangerous Commandsrm -rf /, fork bombs, pipe-to-shell RCE, dynamic eval
Network Accesscurl, wget, netcat, SSH
Filesystem Accesswrites to ~/.ssh/, ~/.gnupg/, access to ironclad.db or wallet.json
Environment Exfiltrationreading $API_KEY, $SECRET, $PASSWORD, process.env, os.environ
Obfuscationbase64-decode piped to shell