Features
Deep-dive into every subsystem of the Roboticus autonomous agent runtime.
LLM Client Pipeline
- ▸Model-agnostic proxy -- provider config fully externalized in TOML
- ▸Format translation -- typed Go structs for 5 API formats (OpenAI, Anthropic, Google, Ollama, sglang)
- ▸Circuit breaker per provider (Closed/Open/HalfOpen, exponential backoff)
- ▸In-flight deduplication -- SHA-256 fingerprinting prevents duplicate concurrent requests
- ▸Tier-based prompt adaptation -- T1 (condensed), T2 (preamble + reorder), T3/T4 (passthrough + cache_control)
- ▸Heuristic model router -- complexity classification + rule-based fallback chain
- ▸3-level semantic cache -- L1 exact hash, L2 embedding cosine similarity, L3 deterministic tool TTL
- ▸Persistent connection pool -- shared http.Client with HTTP/2 multiplexing per provider
- ▸x402 payment protocol -- automatic payment-gated inference (402 -> sign EIP-3009 -> retry)
Agent Core
- ▸ReAct state machine -- Think -> Act -> Observe -> Persist cycle with idle/loop detection
- ▸Tool system -- interface-based plugin architecture with 27 built-in tools + MCP client for external tool servers
- ▸Policy engine -- 6 built-in rules (authority, command safety, financial, path protection, rate limit, validation)
- ▸4-layer prompt injection defense (regex + HMAC boundaries + output validation + behavioral anomaly detection)
- ▸Progressive context loading -- 4 complexity levels (L0 ~2K, L1 ~4K, L2 ~8K, L3 ~16K tokens)
- ▸Subagent framework -- spawn child agents with isolated tool registries and policy overrides
- ▸Human-in-the-loop approvals -- configurable approval gates for high-risk tool calls with dashboard integration
- ▸Browser tool adapter -- wraps the 12-action browser package as LLM-callable tools
- ▸Response transform pipeline -- ReasoningExtractor, FormatNormalizer, ContentGuard stages
- ▸Addressability filter -- composable filter chain for group chat mention/reply detection
Memory System
- ▸5-tier unified memory: Working, Episodic, Semantic, Procedural, Relationship
- ▸Full-text search via SQLite FTS5
- ▸Memory budget manager -- configurable per-tier token allocation with unused rollover
- ▸Background pruning via heartbeat task
Scheduling
- ▸Durable scheduler -- cron expressions, interval, one-time timestamps; all state in SQLite
- ▸Lease-based execution -- prevents double-execution across instances
- ▸Heartbeat daemon -- configurable tick interval, builds TickContext (balance, survival tier) per tick
- ▸7 built-in tasks: SurvivalCheck, UsdcMonitor, YieldTask, MemoryPrune, CacheEvict, MetricSnapshot, AgentCardRefresh
Financial
- ▸Ethereum wallet -- secp256k1 ECDSA keypair with AES-256-GCM encrypted storage (Argon2id KDF)
- ▸x402 payment protocol -- EIP-3009 TransferWithAuthorization for automated LLM payments
- ▸Treasury policy -- per-payment, hourly, daily, and minimum reserve limits
- ▸Yield engine -- deposits idle USDC into Aave/Compound on Base, auto-withdraws below threshold
- ▸Survival tier system -- high/normal/low_compute/critical/dead states drive model downgrading
Channels
- ▸Telegram -- long-poll + webhook, Markdown V2 formatting, 4096-char chunking
- ▸WhatsApp -- Cloud API v21.0 webhook, E.164 validation, read receipts
- ▸Discord -- webhooks, 2000-char chunking, embed formatting
- ▸Signal -- JSON-RPC 2.0, end-to-end encrypted messaging
- ▸Email -- IMAP/SMTP with threading (In-Reply-To/Message-Id), 1MB body limit, 30s poll
- ▸Matrix -- Client v3, optional E2E (Olm/Megolm), UUID transaction IDs
- ▸Voice -- STT/TTS via OpenAI API with local options (Piper, Coqui), configurable models
- ▸WebSocket -- direct browser/client connections with ping/pong keepalive
- ▸A2A (Agent-to-Agent) -- zero-trust protocol with X25519 ECDH, AES-256-GCM encryption, 256 session cap
- ▸Delivery queue -- binary heap with exponential backoff (0s→15m+), dead-letter support, 9 permanent error patterns
Plugin SDK
- ▸Plugin interface -- name(), version(), tools(), init(), execute_tool(), shutdown()
- ▸6 script languages: .gosh, .go, .sh, .py, .rb, .js
- ▸Sandboxed execution -- env_clear with minimal allowlist (PATH, HOME, USER, LANG, TERM, TMPDIR), configurable timeout, output size cap
- ▸Tool name validation -- strict [a-zA-Z0-9_-] allowlist; rejects path separators, null bytes, .., whitespace
- ▸Script path confinement -- canonicalize + starts_with check prevents symlink and .. traversal out of plugin directory
- ▸Dangerous tool flag -- dangerous = true in manifest, queryable via is_tool_dangerous() for policy decisions
- ▸Plugin manifest (plugin.toml) -- declarative tool registration with risk levels
- ▸Auto-discovery -- scans plugin directories, registers tools at boot
- ▸Graceful shutdown -- shutdown_all() tears down every plugin during server shutdown
- ▸Hot-reload -- detects content hash changes and re-registers
Browser Automation
- ▸Chrome DevTools Protocol via WebSocket
- ▸Action types: navigate, click, type, screenshot, evaluate, wait, scroll, extract
- ▸Session management -- start/stop headless Chrome instances
- ▸REST API integration -- /api/browser/* endpoints for remote control
Skill System
- ▸Structured skills (.toml) -- programmatic tool chains with parameter templates, script paths, and policy overrides
- ▸Instruction skills (.md) -- YAML frontmatter (triggers, priority) + markdown body injected into system prompt
- ▸Trigger matching -- keyword, tool name, and regex patterns
- ▸Safety scanning on import -- 50+ danger patterns across 5 categories
- ▸SHA-256 change detection, hot-reload support
Dashboard
- ▸SPA embedded in the binary via go:embed (zero external dependencies)
- ▸12 pages: Overview, Sessions, Context, Memory, Skills, Agents, Scheduler, Metrics, Efficiency, Wallet, Workspace, Settings
- ▸Context Explorer -- per-turn token breakdown, memory tier allocation, complexity level, model used
- ▸Efficiency dashboard -- model comparison cards, cost time series, auto-generated optimization tips
- ▸Approval panel -- real-time pending/approved/denied status via WebSocket push
- ▸Streaming responses -- incremental token rendering with typing indicator
- ▸4 themes: AI Black & Purple, CRT Orange, CRT Green, Psychedelic Freakout
- ▸Live sparkline charts and stacked area charts for cost breakdown
- ▸Retro CRT aesthetic with scanline effects and monospace typography
Streaming Responses
- ▸Token-by-token streaming via Server-Sent Events (SSE) on POST /api/agent/message/stream
- ▸WebSocket push for real-time streaming to connected clients
- ▸StreamAccumulator for buffering and reassembling partial responses
- ▸Per-provider SSE parsing for OpenAI, Anthropic, Google, and Ollama stream formats
Approval Workflow
- ▸Tool gating with three safety tiers: Safe (auto-approve), Gated (requires human approval), Blocked (always denied)
- ▸ApprovalManager with request lifecycle: pending → approved/denied/expired
- ▸HTTP endpoints at /api/approvals with approve/deny actions
- ▸Configurable timeout expiry for pending approval requests
Addressability Filter
- ▸Composable FilterChain with MentionFilter, ReplyFilter, and ConversationFilter
- ▸DM bypass -- always responds in direct messages
- ▸Case-insensitive name matching with configurable aliases
- ▸default_addressability_chain() factory for zero-config setup
Response Transform Pipeline
- ▸Three-stage output processing via ResponsePipeline with pluggable ResponseTransform trait
- ▸ReasoningExtractor -- strips <think> tags and internal chain-of-thought from responses
- ▸FormatNormalizer -- standardizes markdown, code blocks, and whitespace across providers
- ▸ContentGuard -- detects injection markers and security anomalies in output
Context Observatory
- ▸Full turn inspector with per-turn token allocation and memory tier breakdown
- ▸Efficiency metrics: tokens-per-turn, cache hit rate, model utilization per session
- ▸Outcome grading: 1-5 quality scores on individual turns with session-aggregate feedback
- ▸Heuristic analysis tips and LLM-powered deep analysis on turns and sessions
- ▸Behavioral recommendations engine with prioritized improvement suggestions
Flexible Network Binding
- ▸Interface-based binding -- bind to specific network interfaces (0.0.0.0, 127.0.0.1, or any interface)
- ▸Optional TLS with configurable certificate and key paths
- ▸Advertise URL for reverse proxy and NAT traversal scenarios
- ▸Decoupled from any specific VPN or tunnel solution
Obsidian Integration
- ▸Bidirectional knowledge store -- reads vault content via KnowledgeSource trait, writes via Tool implementations
- ▸Full Obsidian support -- YAML frontmatter, case-insensitive wikilink resolution, backlink index, inline #tag extraction
- ▸Three agent tools -- obsidian_read (Safe), obsidian_write (Caution), obsidian_search (Safe)
- ▸Preferred destination -- system prompt directive steers document output to the vault when enabled
- ▸Template engine -- {{variable}} substitution with built-in {{date}} and {{time}} variables
- ▸obsidian:// URI generation -- clickable links to open notes directly in Obsidian
- ▸Auto-detect -- opt-in scanning of specified paths for .obsidian directories
- ▸File watching (optional) -- re-indexes vault on filesystem changes with 500ms debounce
Runtime Management
- ▸Runtime surfaces API -- enumerate active interaction surfaces (dashboard, CLI, channels, etc.)
- ▸Device pairing flow -- pair, inspect, and verify runtime-linked devices
- ▸Peer discovery flow -- discover nearby/known agents and perform explicit verification
- ▸MCP runtime control -- inspect MCP client status, discover remote tool catalogs, disconnect clients
- ▸Operational visibility endpoints -- expose runtime topology for automation and troubleshooting
Onboarding Interview
- ▸Three-phase setup flow -- start, turn-by-turn responses, finish/apply
- ▸Interactive configuration capture -- gather deployment, model, and policy preferences
- ▸Deterministic finalize step -- convert interview answers into persistent runtime configuration
- ▸API-driven onboarding -- supports headless provisioning flows in addition to UI-driven setup
Compatibility Proxy Layer
- ▸OpenAI-compatible endpoints (/v1/chat/completions, /v1/models)
- ▸Anthropic-compatible model listing endpoint
- ▸Format-normalized provider abstraction behind compatibility APIs
- ▸Migration bridge -- lets existing OpenAI/Anthropic clients route through Roboticus with minimal changes
Operations, Audit & Delivery
- ▸Inbound channel webhooks -- Telegram/WhatsApp receivers with verification paths
- ▸Delivery reliability -- dead-letter queue inspection and replay controls
- ▸Channel health surfaces -- adapter status and operational diagnostics
- ▸Turn-level audit APIs -- policy decisions and tool traces for security review
- ▸Approval workflow API -- list/approve/deny sensitive actions with expiry semantics
RAG & Embeddings
Roboticus implements a multi-layer retrieval-augmented generation pipeline spread across three packages. Memories are ingested, indexed for both keyword and vector search, and retrieved into the context window at query time.
1. Five-Tier Memory System
All conversational data is routed into five specialized memory tiers, each backed by its own SQLite table. internal/db/memory.go
| Tier | Purpose | Key Fields |
|---|---|---|
| Working | Active session context (goals, recent summaries) | session-scoped, importance-ranked |
| Episodic | Significant events (tool use, financial ops) | classified, timestamped |
| Semantic | Factual knowledge (key-value with confidence) | upsert on (category, key) |
| Procedural | Tool success/failure tracking | success/failure counters |
| Relationship | Entity trust scores, interaction history | per-entity trust + count |
The MemoryBudgetManager in internal/agent/memory.go allocates a configurable percentage of the total token budget to each tier (default: 30/25/20/15/10).
2. Full-Text Search
Working, episodic, and semantic tiers all feed into an FTS5 virtual table (memory_fts). The fts_search() function queries across all three tiers with a sanitized MATCH query, plus a LIKE fallback for procedural and relationship tables. This is the keyword-based leg of the retrieval pipeline.
internal/db/memory.go
3. Embedding Store & Vector Search
Embeddings are stored as JSON-serialized Vec<f32> in an embeddings table. The search_similar() function does a brute-force scan computing cosine similarity against every stored embedding, filtering by a min_similarity threshold and returning the top-k results.
internal/db/embeddings.go
4. Hybrid Search — The RAG Retrieval Path
hybrid_search() combines both legs:
- ▸FTS5 keyword match — scores are positional (rank-decayed) and weighted by
(1 - hybrid_weight) - ▸Vector cosine similarity — scores are weighted by
hybrid_weight
Results from both are merged, re-sorted by combined score, and truncated to the limit. The hybrid_weight parameter (default 0.5, configurable in MemoryConfig) controls the balance.
internal/db/embeddings.go
5. Semantic Cache
The SemanticCache operates at the LLM request layer with three lookup levels:
- L1Exact hash — SHA-256 of the prompt text, instant match
- L2Semantic similarity — character n-gram embeddings + cosine similarity (threshold 0.85)
- L3Tool-aware TTL — shorter TTL for tool-involving responses (1/4 of normal)
This avoids redundant LLM calls for semantically equivalent prompts.
internal/llm/cache.go
6. Context Assembly
The build_context() function packs the final prompt within a token budget determined by query complexity (L0=2k, L1=4k, L2=8k, L3=16k tokens). It fills the context window in priority order: system prompt, then retrieved memories (the RAG output), then conversation history (newest first, truncated when budget exhausts). When context exceeds 80% capacity, soft_trim evicts oldest non-system messages and build_compaction_prompt can generate a summary for insertion.
internal/agent/context.go
7. Post-Turn Ingestion
After each turn, ingest_turn() classifies the exchange (tool use, financial, social, creative, reasoning) and routes content into the appropriate memory tiers automatically, so future RAG queries have fresh material to retrieve.
internal/agent/memory.go
Current Limitations
The embedding generation itself is placeholder-ready — the system stores and searches vectors, but there is no active embedding model integration yet (embedding_provider and embedding_model in the config are Option<String> and default to None). The semantic cache uses a lightweight character n-gram embedding as a stopgap. A real deployment would need to wire up an embedding provider (local like nomic-embed-text on Ollama, or remote like OpenAI text-embedding-3-small) to generate real vectors for the store_embedding / hybrid_search pipeline.
The brute-force scan in search_similar is also fine for small-to-medium memory stores but would need an index (HNSW or similar) if the embedding count grew into the tens of thousands.
Full Comparison: Roboticus vs OpenClaw
| Dimension | OpenClaw | Roboticus |
|---|---|---|
| Architecture | Node-based gateway control plane + optional platform clients/apps | Single Go binary |
| Languages | Primarily TypeScript/JavaScript; plus Swift/Kotlin for native apps | Go (one language, one toolchain) |
| Memory usage | Varies by enabled channels, models, and companion apps | ~50 MB (1 process) |
| Proxy latency | No official vanilla latency benchmark published | ~2ms (in-process, persistent pool) |
| Cold start | Depends on Node runtime, onboarding state, and enabled services | ~50ms |
| Binary size | npm/pnpm package + Node runtime (not a single static binary) | ~15 MB static binary |
| Supply chain | Large npm dependency graph (plus optional native/platform deps) | 18 auditable Go packages |
| Database | State in ~/.openclaw (JSON/JSONL + SQLite-backed components) | 1 unified SQLite (35 tables, WAL) |
| Model routing | Primary model + fallback model chain | Heuristic complexity routing + rule-based fallback |
| Semantic cache | No documented 3-level semantic cache in vanilla setup | 3-level (exact, embedding, tool TTL) |
| Injection defense | Documented gateway security controls (no published 4-layer pipeline equivalent) | 4-layer defense (regex + HMAC + output + behavioral) |
| Agent-to-agent | Gateway pairing + token/password auth model | Zero-trust (ECDSA, ECDH, AES-256-GCM) |
| Financial | x402 topup only; USDC idle | x402 + yield engine (4-8% APY) |
| Dashboard | Gateway-served web control UI | Embedded SPA (read + write, 74 routes) |
| Plugin system | Skills + extensions + plugin SDK | Dual-format skills + plugin SDK (6 languages) |