Features

Deep-dive into every subsystem of the Ironclad autonomous agent runtime.

LLM Client Pipeline

  • Model-agnostic proxy -- provider config fully externalized in TOML
  • Format translation -- typed Rust enums with From<T> for 4 API formats (12+ translation pairs)
  • Circuit breaker per provider (Closed/Open/HalfOpen, exponential backoff)
  • In-flight deduplication -- SHA-256 fingerprinting prevents duplicate concurrent requests
  • Tier-based prompt adaptation -- T1 (condensed), T2 (preamble + reorder), T3/T4 (passthrough + cache_control)
  • Heuristic model router -- complexity classification + rule-based fallback chain
  • 3-level semantic cache -- L1 exact hash, L2 embedding cosine similarity, L3 deterministic tool TTL
  • Persistent connection pool -- single reqwest::Client with HTTP/2 multiplexing per provider
  • x402 payment protocol -- automatic payment-gated inference (402 -> sign EIP-3009 -> retry)

Agent Core

  • ReAct state machine -- Think -> Act -> Observe -> Persist cycle with idle/loop detection
  • Tool system -- trait-based plugin architecture with 10 tool categories
  • Policy engine -- 6 built-in rules (authority, command safety, financial, path protection, rate limit, validation)
  • 4-layer prompt injection defense (regex + HMAC boundaries + output validation + behavioral anomaly detection)
  • Progressive context loading -- 4 complexity levels (L0 ~2K, L1 ~4K, L2 ~8K, L3 ~16K tokens)
  • Subagent framework -- spawn child agents with isolated tool registries and policy overrides
  • Human-in-the-loop approvals -- configurable approval gates for high-risk tool calls

Memory System

  • 5-tier unified memory: Working, Episodic, Semantic, Procedural, Relationship
  • Full-text search via SQLite FTS5
  • Memory budget manager -- configurable per-tier token allocation with unused rollover
  • Background pruning via heartbeat task

Scheduling

  • Durable scheduler -- cron expressions, interval, one-time timestamps; all state in SQLite
  • Lease-based execution -- prevents double-execution across instances
  • Heartbeat daemon -- configurable tick interval, builds TickContext (balance, survival tier) per tick
  • 7 built-in tasks: SurvivalCheck, UsdcMonitor, YieldTask, MemoryPrune, CacheEvict, MetricSnapshot, AgentCardRefresh

Financial

  • Ethereum wallet -- keypair generation/loading via alloy-rs, EIP-191 signing
  • x402 payment protocol -- EIP-3009 TransferWithAuthorization for automated LLM payments
  • Treasury policy -- per-payment, hourly, daily, and minimum reserve limits
  • Yield engine -- deposits idle USDC into Aave/Compound on Base, auto-withdraws below threshold
  • Survival tier system -- high/normal/low_compute/critical/dead states drive model downgrading

Channels

  • Telegram -- long-poll + webhook, Markdown V2 formatting, 4096-char chunking
  • WhatsApp -- Cloud API webhook, signature verification
  • Discord -- gateway WebSocket, slash commands, embed formatting
  • WebSocket -- direct browser/client connections with ping/pong keepalive
  • A2A (Agent-to-Agent) -- zero-trust protocol with ECDSA mutual auth, ECDH key exchange, AES-256-GCM encryption
  • Delivery queue -- persistent message delivery with retry logic

Plugin SDK

  • Plugin trait -- name(), version(), tools(), execute()
  • 6 script languages: .gosh, .go, .sh, .py, .rb, .js
  • Sandboxed execution -- configurable timeout, output size cap, interpreter whitelist
  • Plugin manifest (plugin.toml) -- declarative tool registration with risk levels
  • Auto-discovery -- scans plugin directories, registers tools at boot
  • Hot-reload -- detects content hash changes and re-registers

Browser Automation

  • Chrome DevTools Protocol via WebSocket
  • Action types: navigate, click, type, screenshot, evaluate, wait, scroll, extract
  • Session management -- start/stop headless Chrome instances
  • REST API integration -- /api/browser/* endpoints for remote control

Skill System

  • Structured skills (.toml) -- programmatic tool chains with parameter templates, script paths, and policy overrides
  • Instruction skills (.md) -- YAML frontmatter (triggers, priority) + markdown body injected into system prompt
  • Trigger matching -- keyword, tool name, and regex patterns
  • Safety scanning on import -- 50+ danger patterns across 5 categories
  • SHA-256 change detection, hot-reload support

Dashboard

  • SPA embedded in the binary (zero external dependencies)
  • 9 pages: Overview, Sessions, Memory, Skills, Scheduler, Metrics, Wallet, Settings, Workspace
  • 4 themes: AI Black & Purple, CRT Orange, CRT Green, Psychedelic Freakout
  • Live sparkline charts and stacked area charts for cost breakdown
  • Retro CRT aesthetic with scanline effects and monospace typography

RAG & Embeddings

Ironclad implements a multi-layer retrieval-augmented generation pipeline spread across three crates. Memories are ingested, indexed for both keyword and vector search, and retrieved into the context window at query time.

1. Five-Tier Memory System

All conversational data is routed into five specialized memory tiers, each backed by its own SQLite table. ironclad-db/src/memory.rs

TierPurposeKey Fields
WorkingActive session context (goals, recent summaries)session-scoped, importance-ranked
EpisodicSignificant events (tool use, financial ops)classified, timestamped
SemanticFactual knowledge (key-value with confidence)upsert on (category, key)
ProceduralTool success/failure trackingsuccess/failure counters
RelationshipEntity trust scores, interaction historyper-entity trust + count

The MemoryBudgetManager in ironclad-agent/src/memory.rs allocates a configurable percentage of the total token budget to each tier (default: 30/25/20/15/10).

2. Full-Text Search

Working, episodic, and semantic tiers all feed into an FTS5 virtual table (memory_fts). The fts_search() function queries across all three tiers with a sanitized MATCH query, plus a LIKE fallback for procedural and relationship tables. This is the keyword-based leg of the retrieval pipeline.

ironclad-db/src/memory.rs

3. Embedding Store & Vector Search

Embeddings are stored as JSON-serialized Vec<f32> in an embeddings table. The search_similar() function does a brute-force scan computing cosine similarity against every stored embedding, filtering by a min_similarity threshold and returning the top-k results.

ironclad-db/src/embeddings.rs

4. Hybrid Search — The RAG Retrieval Path

hybrid_search() combines both legs:

  • FTS5 keyword match — scores are positional (rank-decayed) and weighted by (1 - hybrid_weight)
  • Vector cosine similarity — scores are weighted by hybrid_weight

Results from both are merged, re-sorted by combined score, and truncated to the limit. The hybrid_weight parameter (default 0.5, configurable in MemoryConfig) controls the balance.

ironclad-db/src/embeddings.rs

5. Semantic Cache

The SemanticCache operates at the LLM request layer with three lookup levels:

  • L1Exact hash — SHA-256 of the prompt text, instant match
  • L2Semantic similarity — character n-gram embeddings + cosine similarity (threshold 0.85)
  • L3Tool-aware TTL — shorter TTL for tool-involving responses (1/4 of normal)

This avoids redundant LLM calls for semantically equivalent prompts.

ironclad-llm/src/cache.rs

6. Context Assembly

The build_context() function packs the final prompt within a token budget determined by query complexity (L0=2k, L1=4k, L2=8k, L3=16k tokens). It fills the context window in priority order: system prompt, then retrieved memories (the RAG output), then conversation history (newest first, truncated when budget exhausts). When context exceeds 80% capacity, soft_trim evicts oldest non-system messages and build_compaction_prompt can generate a summary for insertion.

ironclad-agent/src/context.rs

7. Post-Turn Ingestion

After each turn, ingest_turn() classifies the exchange (tool use, financial, social, creative, reasoning) and routes content into the appropriate memory tiers automatically, so future RAG queries have fresh material to retrieve.

ironclad-agent/src/memory.rs

Current Limitations

The embedding generation itself is placeholder-ready — the system stores and searches vectors, but there is no active embedding model integration yet (embedding_provider and embedding_model in the config are Option<String> and default to None). The semantic cache uses a lightweight character n-gram embedding as a stopgap. A real deployment would need to wire up an embedding provider (local like nomic-embed-text on Ollama, or remote like OpenAI text-embedding-3-small) to generate real vectors for the store_embedding / hybrid_search pipeline.

The brute-force scan in search_similar is also fine for small-to-medium memory stores but would need an index (HNSW or similar) if the embedding count grew into the tens of thousands.

Full Comparison: Ironclad vs OpenClaw

DimensionOpenClawIronclad
Architecture3 separate processes (Node.js, Python, TypeScript)Single Rust binary
LanguagesNode.js + Python + TypeScript + GoRust (one language, one toolchain)
Memory usage~500 MB (3 processes)~50 MB (1 process)
Proxy latency~50ms (Python aiohttp)~2ms (in-process, persistent pool)
Cold start~3s (Node.js) + ~2s (Python)~50ms
Binary size~200 MB (node_modules + pip)~15 MB static binary
Supply chain500+ npm + pip packages~50 auditable crates
Database5 storage layers (JSONL, PostgreSQL, SQLite, JSON, MD)1 unified SQLite (28 tables, WAL)
Model routingRule-based fallback onlyHeuristic complexity routing + rule-based fallback
Semantic cacheNone3-level (exact, embedding, tool TTL)
Injection defense8 regex checks4-layer defense (regex + HMAC + output + behavioral)
Agent-to-agentNo mutual authZero-trust (ECDSA, ECDH, AES-256-GCM)
Financialx402 topup only; USDC idlex402 + yield engine (4-8% APY)
DashboardNext.js (separate process, read-only)Embedded SPA (read + write, 41 routes)
Plugin systemMarkdown skills onlyDual-format skills + plugin SDK (6 languages)