Security

Multi-layered defense architecture covering prompt injection, agent-to-agent trust, policy enforcement, script sandboxing, and skill import safety.

4-Layer Prompt Injection Defense

L1Input Gatekeeping

ironclad-agent/injection.rs

Regex patterns, encoding evasion detection, financial manipulation checks, multi-language injection scanning → ThreatScore 0.0–1.0

L2Structured Formatting

ironclad-agent/prompt.rs

HMAC-tagged trust boundaries (session secret + content hash) — unforgeable by injected content

L3Output Validation

ironclad-agent/policy.rs

Authority-based tool access control (creator > self > peer > external), financial guards, self-modification locks

L4Adaptive Refinement

ironclad-agent/policy.rs

Output pattern scanning, behavioral anomaly detection (tool pattern changes, protected file access, repeated financial ops)

▸Mutual authentication via on-chain identity (ERC-8004 registry on Base)
▸Challenge-response with signed nonces + timestamps (60s window)
▸ECDH ephemeral keypairs → AES-256-GCM session encryption with forward secrecy
▸Per-message HMAC authentication, rate limiting, size limits
▸Peer messages pass through injection defense with reduced authority
▸Opacity principle: agents never expose internal memory, prompts, keys, or session history

6 built-in rules. All decisions audit-logged to the policy_decisions table.

Authority

Levels: creator > self > peer > external. Each with progressively restricted tool access.

Command Safety

Tool risk classification: Safe, Caution, Dangerous, Forbidden.

Financial

Per-payment caps, hourly/daily transfer limits, minimum reserve enforcement.

Path Protection

Prevents access to sensitive paths (wallet files, database, config).

Rate Limit

Per-tool and per-session rate limits to prevent runaway execution.

Validation

Input validation and output scanning for all tool calls.

▸Configurable interpreter whitelist (bash, python3, node by default)
▸Environment stripping in sandbox mode (only PATH, HOME, IRONCLAD_SESSION_ID, IRONCLAD_AGENT_ID)
▸Timeout enforcement and output truncation

50+ danger patterns across 5 categories. Verdicts: Clean, Warnings (review recommended), Critical (import blocked).

Category	Examples
Dangerous Commands	rm -rf /, fork bombs, pipe-to-shell RCE, dynamic eval
Network Access	curl, wget, netcat, SSH
Filesystem Access	writes to ~/.ssh/, ~/.gnupg/, access to ironclad.db or wallet.json
Environment Exfiltration	reading $API_KEY, $SECRET, $PASSWORD, process.env, os.environ
Obfuscation	base64-decode piped to shell