Roadmap
Growth areas organized by effort and impact. Each item notes whether it builds on existing code or is greenfield.
v0.10.0 — Current Release
v0.9.x shipped reliability, distribution, and operator hardening (v0.9.0–v0.9.9). v0.10.0 focuses on correctness, safety, and operational maturity — runtime bug fixes from the Go rewrite audit, typestate sessions, model categorization, and dashboard overhaul.
Next focus: telemetry/instrumentation surfacing, effective memory utilization, and subagent management (skill authoring, composition, delegation).
Tier 1 — Wire the Last Mile
Capabilities where the core code exists but isn't fully connected. High impact, low-to-medium effort.
Full streaming pipeline: SSE endpoint (POST /api/agent/message/stream), WebSocket push, StreamAccumulator, per-provider SSE parsing for OpenAI, Anthropic, Google, and Ollama.
Full ApprovalManager with tool classification (Safe/Gated/Blocked), request lifecycle, timeout expiry. HTTP endpoints wired at /api/approvals with approve/deny actions.
Browser package with 12 CDP actions registered in the agent's ToolRegistry. Autonomous web browsing in the ReAct loop via navigate, click, type, screenshot, evaluate, wait, scroll, extract, hover, select, go_back, go_forward.
Shipped in Go rewrite. Discord adapter with webhooks, 2000-char chunking, and embed formatting. Full bidirectional messaging.
EmbeddingClient with support for OpenAI, Ollama, and Google embedding APIs. N-gram fallback when no provider is configured.
Channel adapters can detect media, but end-to-end multimodal ingestion (media download, vision forwarding, transcription, and dashboard rendering) is still incomplete.
5-tier hybrid retrieval (FTS5 + vector cosine) integrated into the agent loop with automatic ingestion and embedding generation.
SMTP outbound via lettre, inbound parsing with threading headers (In-Reply-To/Message-Id), sender filtering. Remaining: IMAP polling for inbound delivery.
SessionScope enum (Agent, Peer, Group) with composite scope keys. find_or_create() respects scope for per-peer and per-group session isolation.
Composable FilterChain with MentionFilter, ReplyFilter, and ConversationFilter. DM bypass, case-insensitive name matching, default_addressability_chain() factory.
Shipped in v0.9.0. Config section [context.checkpoint] with enabled and every_n_turns. Transactional snapshots with crash recovery via context_checkpoints table.
Full ResponsePipeline with ResponseTransform trait. Ships ReasoningExtractor (<think> tag stripping), FormatNormalizer, and ContentGuard (injection marker detection).
Per-model efficiency analytics, turn feedback outcomes, recommendations endpoints, and trend tracking for context assembly quality and cost attribution.
CapacityTracker with sliding-window TPM/RPM counters per provider. headroom() scoring (0.0–1.0), is_near_capacity() threshold at 90% utilization.
Sessions and Context Explorer render markdown with strict URL sanitization and no raw HTML/script execution.
Shipped in v0.9.0. DeliveryQueue::with_store(db) persists messages to delivery_queue table. recover_from_store() replays on startup. Dead-letter alerting added in v0.10.0.
Shipped in v0.9.1. GlobalRateLimitLayer with per-IP, per-actor, and global limits. Trusted proxy CIDR resolution via X-Forwarded-For/X-Real-IP. Throttle observability via GET /api/stats/throttle.
Shipped in v0.10.0. Typestate session lifecycle with cron-conformant rotation, timezone edge-case tests, and clear operator guarantees.
Optional agent-browser CLI backend with browser backend abstraction, policy preservation, and provenance events.
Shipped in v0.9.6. Tap repo, release.yml jobs for Homebrew formula and Winget manifest updates. brew install roboticus / winget install.
Shipped in v0.9.9. Unified channel health monitoring with POST /api/channels/{platform}/test probe, dashboard integrations panel, and roboticus integrations CLI group.
Shipped in v0.9.0. Four tools: get_runtime_context, get_memory_stats, get_channel_health, get_subagent_status. Info on demand, not info by default.
Shipped in v0.9.9. Configurable L0–L3 token budgets via [context_budget] config, dashboard range sliders, and per-channel minimum complexity level.
Codex CLI plugin shipped in v0.10.0. Claude Code and Codex CLI skills as typed tool interfaces with health checks and policy gating.
Shipped in v0.10.0. GET /openapi.json serves OpenAPI 3.1 spec. GET /docs provides spec access for external Swagger UI viewers.
Shipped in v0.9.9. Dashboard sliders for correctness/cost/speed routing weights, validation warnings, and persistence. Spider graph visualization.
Tier 2 — New Capabilities
Features that require significant new code but have clear implementation paths. Medium-to-high effort.
Shipped in Go rewrite. Logistic regression on prompt embeddings (~11μs overhead) that learns from usage which queries need strong vs. weak models. ~60% cost savings.
Per-request accuracy targets (τ) with Lagrangian optimization to minimize cost while maintaining specified quality floor.
Shipped in v0.9.1. ConfidenceEvaluator scores local model responses via token probability, response length, and self-reported uncertainty. EscalationTracker records events. Wired into infer_with_fallback().
Pre-fetch results for likely tool calls while waiting for LLM response. 30–50% latency reduction for predictable sequences.
Service catalog, payment verification, delivery tracking via ServiceManager. Completes the self-sustaining economic loop.
Real-time pricing awareness with cost_per_million_tokens in routing decisions. Route to cheapest provider meeting quality requirements.
Wasmer-based runtime with WasmPlugin, WasmPluginRegistry, capability model (ReadFilesystem, WriteFilesystem, Network, Environment), memory limits, and JSON I/O ABI.
Shipped in v0.9.0. PromptCompressor gate in context assembly, controlled by config.cache.prompt_compression. 2–20x compression with <5% quality loss.
AgentManifest with TOML loader, SHA-256 hot-reload, capability-based discovery via find_by_capability(). Fields: personality, tools whitelist, model tier, memory budget, cron triggers.
WorkspaceContext with file indexing, WorkspaceManifest (workspace.toml), FileCategory classification (Personality, Config, Schema, Document, Data), and /api/workspace/state endpoint.
Trait-based system integrating Directory, Git, VectorDB, and Graph sources into the RAG pipeline via local ingest plus federated query.
Shipped in v0.9.0. digest_on_close() wired into SessionGovernor for session expiry and rotation. Decay-weighted episodic retrieval added in v0.9.6.
Living schema map (hippocampus.go) with register_table, query_schema, describe_table, and lifecycle ops. Gives the agent introspective awareness of its data architecture.
Shipped in v0.9.6. PluginCatalog with CLI flows (roboticus skills catalog list/install/activate) and API endpoints (GET/POST /api/skills/catalog). Multi-registry support with namespace resolution.
Release smoke gates, provenance/signing metadata, and operator-facing telemetry exports for stronger runtime and release confidence.
Shipped in Go rewrite. Full bidirectional Matrix Client v3 integration with optional E2E (Olm/Megolm), UUID transaction IDs, room membership handling, and encrypted sync lifecycle.
Shipped in v0.9.6. Complete revenue opportunity lifecycle with DB-backed restart safety, strategy scoring, EVM swap/tax lifecycle, and operator-visible accounting.
Shipped in v0.9.1. ModelProfile + MetascoreBreakdown with 5-dimension scoring (efficacy, cost, availability, locality, confidence). Category fit added in v0.10.0.
Shipped STT/TTS in Go rewrite. Speech-to-text and TTS via OpenAI API with local options (Piper, Coqui), configurable models. Remaining: WebRTC for real-time voice on dashboard.
Shipped in v0.9.6. Multi-registry config with namespace resolution, semver comparison, priority-based conflict resolution. Enables community skill distribution.
Privacy-preserving on-chain activity ledger for immutable auditability and provenance of agent behavior sequences.
Shipped in v0.9.9. ToolOutputFilterChain with 4 filters (AnsiStripper, ProgressLineFilter, DuplicateLineDeduper, WhitespaceNormalizer). 30–70% token reduction on tool-heavy turns.
Partially shipped in v0.10.0. LandlockConfig for Linux filesystem restriction. macOS sandbox-exec already active. Windows AppContainer pending.
Partially shipped in v0.11.0/v0.11.1. Pipeline Traces, Flight Recorder, and Delegation Outcomes tabs in dashboard. Skill/subagent utilization telemetry. Remaining: per-tool success/latency/cost breakdown, memory retrieval effectiveness scoring.
Per-stage timing across the full request-to-response pipeline: intent classification, memory retrieval, context assembly, prompt compression, model routing, inference, tool execution, response transforms, channel formatting, delivery. Dashboard waterfall chart with sub-spans and regression detection.
Core shipped in v0.11.0. ReactTrace capture (tool calls, retrievals, guards, normalization), pipeline trace persistence, dashboard visualization. Remaining: replay, diff, export (JSON/CSV/markdown), trace search API.
Before/after metric snapshots for all maintenance tasks (MemoryPrune, CacheEvict, SessionGovernor, hygiene sweeps). Outcome correlation tracking retrieval quality and cache hit rates after each task. Dashboard trend charts showing whether maintenance is helping or hurting.
Unified dashboard Observability section: pipeline waterfall view per request, agent performance dashboard with tool/delegation/memory metrics, maintenance outcome trend charts (before/after deltas), memory health heatmap, cross-cutting timeline correlating events with quality changes, and alert badges for regressions.
Retrieval attribution tracking, per-entry memory ROI scoring, stale context detection, knowledge gap identification, and tier-level utilization heatmaps.
Dynamic per-tier budget allocation based on retrieval effectiveness telemetry. Tiers that provide useful context get more budget automatically.
End-to-end skill authoring: interactive scaffolding CLI, testing harness, agent-assisted authoring from natural language, registry publish flow, and dashboard skill editor.
Declarative subagent composition from available skills with dependency resolution, capability coverage validation, template library, and hot-reload.
Performance-informed delegation scoring, explainable routing decisions, automatic fallback chains, cost attribution, and operator override controls.
Shipped in v0.11.1. SemanticClassifier with centroid-based embedding classification replaces keyword matching. Typed categories, per-bank thresholds, TrustLevel, AbstainPolicy, ClassificationTrace.
Shipped in v0.11.1. ProfileRegistry with named configurations, roboticus apps install/uninstall/list CLI, manifest-driven profile creation with skills/themes/subagents. Theme marketplace planned for v0.11.2.
Decompose monolithic dashboard_spa.html into CSS partials, JS modules per feature area, and Go-assembled HTML templates. No framework migration — remains server-rendered SPA.
Decorations manifest: ornamental separators, card borders, background textures, scrollbar styling, animation accents, audio ambience. Workspace theming: bot sprites, station icons, tether lines, particle effects.
Per-turn flow graph: visual nodes for intent classification, cache decision, planner action, retrieval, inference, guards. Decision annotations, timing overlays, fallback visibility. Backed by shared trace data.
Tier 3 — Frontier
Ambitious capabilities that push the architecture into new territory. High effort, high potential.
Partially shipped. Session lifecycle state machine with runtime enforcement — Created/Active/Closed transitions validated at every boundary. Go interfaces enforce valid transitions across the codebase.
Dual-role implementation: McpServerRegistry exposes tools/resources to clients, McpClientManager connects to external MCP servers. export_tools_as_mcp() utility for tool conversion.
Partially shipped in v0.7.0+. Subagent role contracts (subagent vs model-proxy), roster semantics, model-assignment modes, and turn-linked forensics are live. Full workflow orchestration patterns remain.
Spawn child agents with provisioned wallets, delegated tasks, and fund reclamation on completion or timeout.
Binary embedding storage, HNSW index, content chunking, persistent cache (all done). Remaining: document ingestion pipeline.
Promoted to Tier 2 as item 2.20. See 2.20 Voice Channels for current spec.
Feature vectors derived from model capabilities, pricing, and benchmarks. Route among unseen models without retraining.
Automatically determine optimal cascade strategy per query type — sometimes skipping the weak model is cheaper than trying it first.
Partially shipped in v0.10.0. StorageBackend trait with SqliteBackend implementation. PostgreSQL backend available as opt-in via --features postgres.
Zero-trust device identity and pairing built on existing ECDSA/ECDH infrastructure with encrypted state sync.
DNS-based agent discovery with SRV/TXT records and mDNS fallback for zero-config LAN scenarios.
Interface-based binding (0.0.0.0, 127.0.0.1, or any interface), optional TLS with configurable cert/key paths, advertise URL for reverse proxy and NAT traversal.
Defense-in-depth remote access: mandatory TLS, OIDC/SAML SSO + MFA, device trust, per-route RBAC, IP reputation, CSRF/CORS hardening, signed session rotation.
Audio-frequency agent-to-agent communication encoding structured data as modulated audio signals over voice channels (WebRTC, phone, Discord voice).