Skip to content

Features

Deep-dive into every subsystem of the Roboticus autonomous agent runtime.

LLM Client Pipeline

  • Model-agnostic proxy -- provider config fully externalized in TOML
  • Format translation -- typed Go structs for 5 API formats (OpenAI, Anthropic, Google, Ollama, sglang)
  • Circuit breaker per provider (Closed/Open/HalfOpen, exponential backoff)
  • In-flight deduplication -- SHA-256 fingerprinting prevents duplicate concurrent requests
  • Tier-based prompt adaptation -- T1 (condensed), T2 (preamble + reorder), T3/T4 (passthrough + cache_control)
  • Heuristic model router -- complexity classification + rule-based fallback chain
  • 3-level semantic cache -- L1 exact hash, L2 embedding cosine similarity, L3 deterministic tool TTL
  • Persistent connection pool -- shared http.Client with HTTP/2 multiplexing per provider
  • x402 payment protocol -- automatic payment-gated inference (402 -> sign EIP-3009 -> retry)

Agent Core

  • ReAct state machine -- Think -> Act -> Observe -> Persist cycle with idle/loop detection
  • Tool system -- interface-based plugin architecture with 27 built-in tools + MCP client for external tool servers
  • Policy engine -- 6 built-in rules (authority, command safety, financial, path protection, rate limit, validation)
  • 4-layer prompt injection defense (regex + HMAC boundaries + output validation + behavioral anomaly detection)
  • Progressive context loading -- 4 complexity levels (L0 ~2K, L1 ~4K, L2 ~8K, L3 ~16K tokens)
  • Subagent framework -- spawn child agents with isolated tool registries and policy overrides
  • Human-in-the-loop approvals -- configurable approval gates for high-risk tool calls with dashboard integration
  • Browser tool adapter -- wraps the 12-action browser package as LLM-callable tools
  • Response transform pipeline -- ReasoningExtractor, FormatNormalizer, ContentGuard stages
  • Addressability filter -- composable filter chain for group chat mention/reply detection

Memory System

  • 5-tier unified memory: Working, Episodic, Semantic, Procedural, Relationship
  • Full-text search via SQLite FTS5
  • Memory budget manager -- configurable per-tier token allocation with unused rollover
  • Background pruning via heartbeat task

Scheduling

  • Durable scheduler -- cron expressions, interval, one-time timestamps; all state in SQLite
  • Lease-based execution -- prevents double-execution across instances
  • Heartbeat daemon -- configurable tick interval, builds TickContext (balance, survival tier) per tick
  • 7 built-in tasks: SurvivalCheck, UsdcMonitor, YieldTask, MemoryPrune, CacheEvict, MetricSnapshot, AgentCardRefresh

Financial

  • Ethereum wallet -- secp256k1 ECDSA keypair with AES-256-GCM encrypted storage (Argon2id KDF)
  • x402 payment protocol -- EIP-3009 TransferWithAuthorization for automated LLM payments
  • Treasury policy -- per-payment, hourly, daily, and minimum reserve limits
  • Yield engine -- deposits idle USDC into Aave/Compound on Base, auto-withdraws below threshold
  • Survival tier system -- high/normal/low_compute/critical/dead states drive model downgrading

Channels

  • Telegram -- long-poll + webhook, Markdown V2 formatting, 4096-char chunking
  • WhatsApp -- Cloud API v21.0 webhook, E.164 validation, read receipts
  • Discord -- webhooks, 2000-char chunking, embed formatting
  • Signal -- JSON-RPC 2.0, end-to-end encrypted messaging
  • Email -- IMAP/SMTP with threading (In-Reply-To/Message-Id), 1MB body limit, 30s poll
  • Matrix -- Client v3, optional E2E (Olm/Megolm), UUID transaction IDs
  • Voice -- STT/TTS via OpenAI API with local options (Piper, Coqui), configurable models
  • WebSocket -- direct browser/client connections with ping/pong keepalive
  • A2A (Agent-to-Agent) -- zero-trust protocol with X25519 ECDH, AES-256-GCM encryption, 256 session cap
  • Delivery queue -- binary heap with exponential backoff (0s→15m+), dead-letter support, 9 permanent error patterns

Plugin SDK

  • Plugin interface -- name(), version(), tools(), init(), execute_tool(), shutdown()
  • 6 script languages: .gosh, .go, .sh, .py, .rb, .js
  • Sandboxed execution -- env_clear with minimal allowlist (PATH, HOME, USER, LANG, TERM, TMPDIR), configurable timeout, output size cap
  • Tool name validation -- strict [a-zA-Z0-9_-] allowlist; rejects path separators, null bytes, .., whitespace
  • Script path confinement -- canonicalize + starts_with check prevents symlink and .. traversal out of plugin directory
  • Dangerous tool flag -- dangerous = true in manifest, queryable via is_tool_dangerous() for policy decisions
  • Plugin manifest (plugin.toml) -- declarative tool registration with risk levels
  • Auto-discovery -- scans plugin directories, registers tools at boot
  • Graceful shutdown -- shutdown_all() tears down every plugin during server shutdown
  • Hot-reload -- detects content hash changes and re-registers

Browser Automation

  • Chrome DevTools Protocol via WebSocket
  • Action types: navigate, click, type, screenshot, evaluate, wait, scroll, extract
  • Session management -- start/stop headless Chrome instances
  • REST API integration -- /api/browser/* endpoints for remote control

Skill System

  • Structured skills (.toml) -- programmatic tool chains with parameter templates, script paths, and policy overrides
  • Instruction skills (.md) -- YAML frontmatter (triggers, priority) + markdown body injected into system prompt
  • Trigger matching -- keyword, tool name, and regex patterns
  • Safety scanning on import -- 50+ danger patterns across 5 categories
  • SHA-256 change detection, hot-reload support

Dashboard

  • SPA embedded in the binary via go:embed (zero external dependencies)
  • 12 pages: Overview, Sessions, Context, Memory, Skills, Agents, Scheduler, Metrics, Efficiency, Wallet, Workspace, Settings
  • Context Explorer -- per-turn token breakdown, memory tier allocation, complexity level, model used
  • Efficiency dashboard -- model comparison cards, cost time series, auto-generated optimization tips
  • Approval panel -- real-time pending/approved/denied status via WebSocket push
  • Streaming responses -- incremental token rendering with typing indicator
  • 4 themes: AI Black & Purple, CRT Orange, CRT Green, Psychedelic Freakout
  • Live sparkline charts and stacked area charts for cost breakdown
  • Retro CRT aesthetic with scanline effects and monospace typography

Streaming Responses

  • Token-by-token streaming via Server-Sent Events (SSE) on POST /api/agent/message/stream
  • WebSocket push for real-time streaming to connected clients
  • StreamAccumulator for buffering and reassembling partial responses
  • Per-provider SSE parsing for OpenAI, Anthropic, Google, and Ollama stream formats

Approval Workflow

  • Tool gating with three safety tiers: Safe (auto-approve), Gated (requires human approval), Blocked (always denied)
  • ApprovalManager with request lifecycle: pending → approved/denied/expired
  • HTTP endpoints at /api/approvals with approve/deny actions
  • Configurable timeout expiry for pending approval requests

Addressability Filter

  • Composable FilterChain with MentionFilter, ReplyFilter, and ConversationFilter
  • DM bypass -- always responds in direct messages
  • Case-insensitive name matching with configurable aliases
  • default_addressability_chain() factory for zero-config setup

Response Transform Pipeline

  • Three-stage output processing via ResponsePipeline with pluggable ResponseTransform trait
  • ReasoningExtractor -- strips <think> tags and internal chain-of-thought from responses
  • FormatNormalizer -- standardizes markdown, code blocks, and whitespace across providers
  • ContentGuard -- detects injection markers and security anomalies in output

Context Observatory

  • Full turn inspector with per-turn token allocation and memory tier breakdown
  • Efficiency metrics: tokens-per-turn, cache hit rate, model utilization per session
  • Outcome grading: 1-5 quality scores on individual turns with session-aggregate feedback
  • Heuristic analysis tips and LLM-powered deep analysis on turns and sessions
  • Behavioral recommendations engine with prioritized improvement suggestions

Flexible Network Binding

  • Interface-based binding -- bind to specific network interfaces (0.0.0.0, 127.0.0.1, or any interface)
  • Optional TLS with configurable certificate and key paths
  • Advertise URL for reverse proxy and NAT traversal scenarios
  • Decoupled from any specific VPN or tunnel solution

Obsidian Integration

  • Bidirectional knowledge store -- reads vault content via KnowledgeSource trait, writes via Tool implementations
  • Full Obsidian support -- YAML frontmatter, case-insensitive wikilink resolution, backlink index, inline #tag extraction
  • Three agent tools -- obsidian_read (Safe), obsidian_write (Caution), obsidian_search (Safe)
  • Preferred destination -- system prompt directive steers document output to the vault when enabled
  • Template engine -- {{variable}} substitution with built-in {{date}} and {{time}} variables
  • obsidian:// URI generation -- clickable links to open notes directly in Obsidian
  • Auto-detect -- opt-in scanning of specified paths for .obsidian directories
  • File watching (optional) -- re-indexes vault on filesystem changes with 500ms debounce

Runtime Management

  • Runtime surfaces API -- enumerate active interaction surfaces (dashboard, CLI, channels, etc.)
  • Device pairing flow -- pair, inspect, and verify runtime-linked devices
  • Peer discovery flow -- discover nearby/known agents and perform explicit verification
  • MCP runtime control -- inspect MCP client status, discover remote tool catalogs, disconnect clients
  • Operational visibility endpoints -- expose runtime topology for automation and troubleshooting

Onboarding Interview

  • Three-phase setup flow -- start, turn-by-turn responses, finish/apply
  • Interactive configuration capture -- gather deployment, model, and policy preferences
  • Deterministic finalize step -- convert interview answers into persistent runtime configuration
  • API-driven onboarding -- supports headless provisioning flows in addition to UI-driven setup

Compatibility Proxy Layer

  • OpenAI-compatible endpoints (/v1/chat/completions, /v1/models)
  • Anthropic-compatible model listing endpoint
  • Format-normalized provider abstraction behind compatibility APIs
  • Migration bridge -- lets existing OpenAI/Anthropic clients route through Roboticus with minimal changes

Operations, Audit & Delivery

  • Inbound channel webhooks -- Telegram/WhatsApp receivers with verification paths
  • Delivery reliability -- dead-letter queue inspection and replay controls
  • Channel health surfaces -- adapter status and operational diagnostics
  • Turn-level audit APIs -- policy decisions and tool traces for security review
  • Approval workflow API -- list/approve/deny sensitive actions with expiry semantics

RAG & Embeddings

Roboticus implements a multi-layer retrieval-augmented generation pipeline spread across three packages. Memories are ingested, indexed for both keyword and vector search, and retrieved into the context window at query time.

1. Five-Tier Memory System

All conversational data is routed into five specialized memory tiers, each backed by its own SQLite table. internal/db/memory.go

TierPurposeKey Fields
WorkingActive session context (goals, recent summaries)session-scoped, importance-ranked
EpisodicSignificant events (tool use, financial ops)classified, timestamped
SemanticFactual knowledge (key-value with confidence)upsert on (category, key)
ProceduralTool success/failure trackingsuccess/failure counters
RelationshipEntity trust scores, interaction historyper-entity trust + count

The MemoryBudgetManager in internal/agent/memory.go allocates a configurable percentage of the total token budget to each tier (default: 30/25/20/15/10).

2. Full-Text Search

Working, episodic, and semantic tiers all feed into an FTS5 virtual table (memory_fts). The fts_search() function queries across all three tiers with a sanitized MATCH query, plus a LIKE fallback for procedural and relationship tables. This is the keyword-based leg of the retrieval pipeline.

internal/db/memory.go

3. Embedding Store & Vector Search

Embeddings are stored as JSON-serialized Vec<f32> in an embeddings table. The search_similar() function does a brute-force scan computing cosine similarity against every stored embedding, filtering by a min_similarity threshold and returning the top-k results.

internal/db/embeddings.go

4. Hybrid Search — The RAG Retrieval Path

hybrid_search() combines both legs:

  • FTS5 keyword match — scores are positional (rank-decayed) and weighted by (1 - hybrid_weight)
  • Vector cosine similarity — scores are weighted by hybrid_weight

Results from both are merged, re-sorted by combined score, and truncated to the limit. The hybrid_weight parameter (default 0.5, configurable in MemoryConfig) controls the balance.

internal/db/embeddings.go

5. Semantic Cache

The SemanticCache operates at the LLM request layer with three lookup levels:

  • L1Exact hash — SHA-256 of the prompt text, instant match
  • L2Semantic similarity — character n-gram embeddings + cosine similarity (threshold 0.85)
  • L3Tool-aware TTL — shorter TTL for tool-involving responses (1/4 of normal)

This avoids redundant LLM calls for semantically equivalent prompts.

internal/llm/cache.go

6. Context Assembly

The build_context() function packs the final prompt within a token budget determined by query complexity (L0=2k, L1=4k, L2=8k, L3=16k tokens). It fills the context window in priority order: system prompt, then retrieved memories (the RAG output), then conversation history (newest first, truncated when budget exhausts). When context exceeds 80% capacity, soft_trim evicts oldest non-system messages and build_compaction_prompt can generate a summary for insertion.

internal/agent/context.go

7. Post-Turn Ingestion

After each turn, ingest_turn() classifies the exchange (tool use, financial, social, creative, reasoning) and routes content into the appropriate memory tiers automatically, so future RAG queries have fresh material to retrieve.

internal/agent/memory.go

Current Limitations

The embedding generation itself is placeholder-ready — the system stores and searches vectors, but there is no active embedding model integration yet (embedding_provider and embedding_model in the config are Option<String> and default to None). The semantic cache uses a lightweight character n-gram embedding as a stopgap. A real deployment would need to wire up an embedding provider (local like nomic-embed-text on Ollama, or remote like OpenAI text-embedding-3-small) to generate real vectors for the store_embedding / hybrid_search pipeline.

The brute-force scan in search_similar is also fine for small-to-medium memory stores but would need an index (HNSW or similar) if the embedding count grew into the tens of thousands.

Full Comparison: Roboticus vs OpenClaw

DimensionOpenClawRoboticus
ArchitectureNode-based gateway control plane + optional platform clients/appsSingle Go binary
LanguagesPrimarily TypeScript/JavaScript; plus Swift/Kotlin for native appsGo (one language, one toolchain)
Memory usageVaries by enabled channels, models, and companion apps~50 MB (1 process)
Proxy latencyNo official vanilla latency benchmark published~2ms (in-process, persistent pool)
Cold startDepends on Node runtime, onboarding state, and enabled services~50ms
Binary sizenpm/pnpm package + Node runtime (not a single static binary)~15 MB static binary
Supply chainLarge npm dependency graph (plus optional native/platform deps)18 auditable Go packages
DatabaseState in ~/.openclaw (JSON/JSONL + SQLite-backed components)1 unified SQLite (35 tables, WAL)
Model routingPrimary model + fallback model chainHeuristic complexity routing + rule-based fallback
Semantic cacheNo documented 3-level semantic cache in vanilla setup3-level (exact, embedding, tool TTL)
Injection defenseDocumented gateway security controls (no published 4-layer pipeline equivalent)4-layer defense (regex + HMAC + output + behavioral)
Agent-to-agentGateway pairing + token/password auth modelZero-trust (ECDSA, ECDH, AES-256-GCM)
Financialx402 topup only; USDC idlex402 + yield engine (4-8% APY)
DashboardGateway-served web control UIEmbedded SPA (read + write, 74 routes)
Plugin systemSkills + extensions + plugin SDKDual-format skills + plugin SDK (6 languages)