Roadmap

Growth areas organized by effort and impact. Each item notes whether it builds on existing code or is greenfield.

78 roadmap items40 shipped · 11 partial · 27 planned

Research Attribution

Roadmap items that adopt or investigate whitepaper ideas credit the source here and must still be validated by Roboticus evidence before product claims are made.

Agent behavioral contracts, guard/RCA contract events, hard/soft recovery semantics Agent Behavioral Contracts: Formal Specification and Runtime Enforcement for Reliable Autonomous AI Agents

Context-as-Environment Mode support for the existing memory-as-index/tool architecture, recursive memory/retrieval investigation, and learned decomposition gating Recursive Language Models

Semantic-collapse defense, late-interaction retrieval, reranking, and agentic RAG roadmap work ColBERTv2, Agentic RAG Survey, A-RAG

Learning-loop closure and reusable tool-use procedure synthesis Autonomous tool-use learning in LLM agents

Additional retrieval references are tracked in the semantic collapse roadmap; attribution is credit, not proof.

v0.10.0 — Current Release

v0.9.x shipped reliability, distribution, and operator hardening (v0.9.0–v0.9.9). v0.10.0 focuses on correctness, safety, and operational maturity — runtime bug fixes from the Go rewrite audit, typestate sessions, model categorization, and dashboard overhaul.

Next focus: telemetry/instrumentation surfacing, effective memory utilization, and subagent management (skill authoring, composition, delegation).

Tier 1 — Wire the Last Mile

Capabilities where the core code exists but isn't fully connected. High impact, low-to-medium effort.

1.1

Streaming LLM ResponsesDONE

Full streaming pipeline: SSE endpoint (POST /api/agent/message/stream), WebSocket push, StreamAccumulator, per-provider SSE parsing for OpenAI, Anthropic, Google, and Ollama.

Medium

1.2

Approval Workflow APIDONE

Full ApprovalManager with tool classification (Safe/Gated/Blocked), request lifecycle, timeout expiry. HTTP endpoints wired at /api/approvals with approve/deny actions.

Low

1.3

Browser as Agent ToolDONE

Browser package with 12 CDP actions registered in the agent's ToolRegistry. Autonomous web browsing in the ReAct loop via navigate, click, type, screenshot, evaluate, wait, scroll, extract, hover, select, go_back, go_forward.

Low

1.4

Discord WebSocket GatewayDONE

Shipped in Go rewrite. Discord adapter with webhooks, 2000-char chunking, and embed formatting. Full bidirectional messaging.

Low

1.5

Embedding Provider IntegrationDONE

EmbeddingClient with support for OpenAI, Ollama, and Google embedding APIs. N-gram fallback when no provider is configured.

Medium

1.6

Multimodal Message HandlingPLANNED

Channel adapters can detect media, but end-to-end multimodal ingestion (media download, vision forwarding, transcription, and dashboard rendering) is still incomplete.

Medium

1.7

Memory-Augmented Agent PipelineDONE

5-tier hybrid retrieval (FTS5 + vector cosine) integrated into the agent loop with automatic ingestion and embedding generation.

High

1.8

Email Channel AdapterPARTIAL

SMTP outbound via lettre, inbound parsing with threading headers (In-Reply-To/Message-Id), sender filtering. Remaining: IMAP polling for inbound delivery.

Medium

1.9

Session Scoping and LifecycleDONE

SessionScope enum (Agent, Peer, Group) with composite scope keys. find_or_create() respects scope for per-peer and per-group session isolation.

Medium

1.10

Addressability FilterDONE

Composable FilterChain with MentionFilter, ReplyFilter, and ConversationFilter. DM bypass, case-insensitive name matching, default_addressability_chain() factory.

Low

1.11

Context CheckpointDONE

Shipped in v0.9.0. Config section [context.checkpoint] with enabled and every_n_turns. Transactional snapshots with crash recovery via context_checkpoints table.

Medium

1.12

Response Transform PipelineDONE

Full ResponsePipeline with ResponseTransform trait. Ships ReasoningExtractor (<think> tag stripping), FormatNormalizer, and ContentGuard (injection marker detection).

Low-Medium

1.13

Context ObservatoryDONE

Per-model efficiency analytics, turn feedback outcomes, recommendations endpoints, and trend tracking for context assembly quality and cost attribution.

Medium

1.14

Capacity-Aware RoutingDONE

CapacityTracker with sliding-window TPM/RPM counters per provider. headroom() scoring (0.0–1.0), is_near_capacity() threshold at 90% utilization.

Medium

1.15

Sessions Markdown RenderingDONE

Sessions and Context Explorer render markdown with strict URL sanitization and no raw HTML/script execution.

Low-Medium

1.16

Durable Channel Delivery QueueDONE

Shipped in v0.9.0. DeliveryQueue::with_store(db) persists messages to delivery_queue table. recover_from_store() replays on startup. Dead-letter alerting added in v0.10.0.

Medium

1.17

Production-Grade Abuse ProtectionDONE

Shipped in v0.9.1. GlobalRateLimitLayer with per-IP, per-actor, and global limits. Trusted proxy CIDR resolution via X-Forwarded-For/X-Real-IP. Throttle observability via GET /api/stats/throttle.

Medium

1.18

Cron-Conformant Session RotationDONE

Shipped in v0.10.0. Typestate session lifecycle with cron-conformant rotation, timezone edge-case tests, and clear operator guarantees.

Medium

1.19

agent-browser External RuntimePLANNED

Optional agent-browser CLI backend with browser backend abstraction, policy preservation, and provenance events.

Medium

1.20

Homebrew & Winget DistributionDONE

Shipped in v0.9.6. Tap repo, release.yml jobs for Homebrew formula and Winget manifest updates. brew install roboticus / winget install.

Low

1.21

Integrations ManagementDONE

Shipped in v0.9.9. Unified channel health monitoring with POST /api/channels/{platform}/test probe, dashboard integrations panel, and roboticus integrations CLI group.

Medium

1.22

Built-in Introspection SkillDONE

Shipped in v0.9.0. Four tools: get_runtime_context, get_memory_stats, get_channel_health, get_subagent_status. Info on demand, not info by default.

Low

1.23

Context Budget TuningDONE

Shipped in v0.9.9. Configurable L0–L3 token budgets via [context_budget] config, dashboard range sliders, and per-channel minimum complexity level.

Low

1.24

Built-in CLI Agent SkillsPARTIAL

Codex CLI plugin shipped in v0.10.0. Claude Code and Codex CLI skills as typed tool interfaces with health checks and policy gating.

Medium

1.25

OpenAPI + Swagger EndpointDONE

Shipped in v0.10.0. GET /openapi.json serves OpenAPI 3.1 spec. GET /docs provides spec access for external Swagger UI viewers.

Low

1.26

Routing Profile + Spider GraphDONE

Shipped in v0.9.9. Dashboard sliders for correctness/cost/speed routing weights, validation warnings, and persistence. Spider graph visualization.

Medium

Tier 2 — New Capabilities

Features that require significant new code but have clear implementation paths. Medium-to-high effort.

2.1

ML-Based Model RoutingDONE

Shipped in Go rewrite. Logistic regression on prompt embeddings (~11μs overhead) that learns from usage which queries need strong vs. weak models. ~60% cost savings.

High

2.2

Accuracy-Target RoutingPLANNED

Per-request accuracy targets (τ) with Lagrangian optimization to minimize cost while maintaining specified quality floor.

High

2.3

Tiered Inference PipelineDONE

Shipped in v0.9.1. ConfidenceEvaluator scores local model responses via token probability, response length, and self-reported uncertainty. EscalationTracker records events. Wired into infer_with_fallback().

Medium

2.4

Speculative ExecutionPLANNED

Pre-fetch results for likely tool calls while waiting for LLM response. 30–50% latency reduction for predictable sequences.

Medium

2.5

Service Revenue & Inbound PaymentsPLANNED

Service catalog, payment verification, delivery tracking via ServiceManager. Completes the self-sustaining economic loop.

High

2.6

Multi-Provider Cost ArbitragePLANNED

Real-time pricing awareness with cost_per_million_tokens in routing decisions. Route to cheapest provider meeting quality requirements.

Medium

2.7

WASM Plugin RuntimeDONE

Wasmer-based runtime with WasmPlugin, WasmPluginRegistry, capability model (ReadFilesystem, WriteFilesystem, Network, Environment), memory limits, and JSON I/O ABI.

High

2.8

Prompt CompressionDONE

Shipped in v0.9.0. PromptCompressor gate in context assembly, controlled by config.cache.prompt_compression. 2–20x compression with <5% quality loss.

Medium

2.9

Declarative Agent ManifestsDONE

AgentManifest with TOML loader, SHA-256 hot-reload, capability-based discovery via find_by_capability(). Fields: personality, tools whitelist, model tier, memory budget, cron triggers.

High

2.10

Structured Workspace SystemDONE

WorkspaceContext with file indexing, WorkspaceManifest (workspace.toml), FileCategory classification (Personality, Config, Schema, Document, Data), and /api/workspace/state endpoint.

Medium

2.11

Knowledge Source TraitDONE

Trait-based system integrating Directory, Git, VectorDB, and Graph sources into the RAG pipeline via local ingest plus federated query.

High

2.12

Episodic Digest SystemDONE

Shipped in v0.9.0. digest_on_close() wired into SessionGovernor for session expiry and rotation. Decay-weighted episodic retrieval added in v0.9.6.

Medium

2.13

Hippocampus — Self-Describing Schema MapDONE

Living schema map (hippocampus.go) with register_table, query_schema, describe_table, and lifecycle ops. Gives the agent introspective awareness of its data architecture.

High

2.14

Skills Catalog (CLI + Dashboard)DONE

Shipped in v0.9.6. PluginCatalog with CLI flows (roboticus skills catalog list/install/activate) and API endpoints (GET/POST /api/skills/catalog). Multi-registry support with namespace resolution.

Medium

2.15

Ops Telemetry + Release Provenance GatePLANNED

Release smoke gates, provenance/signing metadata, and operator-facing telemetry exports for stronger runtime and release confidence.

Medium

2.16

Matrix Channel AdapterDONE

Shipped in Go rewrite. Full bidirectional Matrix Client v3 integration with optional E2E (Olm/Megolm), UUID transaction IDs, room membership handling, and encrypted sync lifecycle.

High

2.18

Compliance-First Self-FundingDONE

Shipped in v0.9.6. Complete revenue opportunity lifecycle with DB-backed restart safety, strategy scoring, EVM swap/tax lifecycle, and operator-visible accounting.

High

2.19

Model Metascore Routing ProfilesDONE

Shipped in v0.9.1. ModelProfile + MetascoreBreakdown with 5-dimension scoring (efficacy, cost, availability, locality, confidence). Category fit added in v0.10.0.

High

2.20

Voice ChannelsPARTIAL

Shipped STT/TTS in Go rewrite. Speech-to-text and TTS via OpenAI API with local options (Piper, Coqui), configurable models. Remaining: WebRTC for real-time voice on dashboard.

High

2.21

Skill Registry ProtocolDONE

Shipped in v0.9.6. Multi-registry config with namespace resolution, semver comparison, priority-based conflict resolution. Enables community skill distribution.

Medium

2.22

Anchored Agent Audit LedgerPLANNED

Privacy-preserving on-chain activity ledger for immutable auditability and provenance of agent behavior sequences.

High

2.23

Tool Output Noise FilterDONE

Shipped in v0.9.9. ToolOutputFilterChain with 4 filters (AnsiStripper, ProgressLineFilter, DuplicateLineDeduper, WhitespaceNormalizer). 30–70% token reduction on tool-heavy turns.

Medium

2.24

Cross-Platform Filesystem SandboxingPARTIAL

Partially shipped in v0.10.0. LandlockConfig for Linux filesystem restriction. macOS sandbox-exec already active. Windows AppContainer pending.

Medium

2.25

Agent Telemetry DashboardPARTIAL

Partially shipped in v0.11.0/v0.11.1. Pipeline Traces, Flight Recorder, and Delegation Outcomes tabs in dashboard. Skill/subagent utilization telemetry. Remaining: per-tool success/latency/cost breakdown, memory retrieval effectiveness scoring.

Medium

2.32

Pipeline Stage InstrumentationPLANNED

Per-stage timing across the full request-to-response pipeline: intent classification, memory retrieval, context assembly, prompt compression, model routing, inference, tool execution, response transforms, channel formatting, delivery. Dashboard waterfall chart with sub-spans and regression detection.

Medium

2.26

Conversation Flight RecorderPARTIAL

Core shipped in v0.11.0. ReactTrace capture (tool calls, retrievals, guards, normalization), pipeline trace persistence, dashboard visualization. Remaining: replay, diff, export (JSON/CSV/markdown), trace search API.

Medium

2.33

Background Task Outcome LoggingPLANNED

Before/after metric snapshots for all maintenance tasks (MemoryPrune, CacheEvict, SessionGovernor, hygiene sweeps). Outcome correlation tracking retrieval quality and cache hit rates after each task. Dashboard trend charts showing whether maintenance is helping or hurting.

Medium

2.34

Observability UIPLANNED

Unified dashboard Observability section: pipeline waterfall view per request, agent performance dashboard with tool/delegation/memory metrics, maintenance outcome trend charts (before/after deltas), memory health heatmap, cross-cutting timeline correlating events with quality changes, and alert badges for regressions.

High

2.27

Memory Utilization AnalyticsPLANNED

Retrieval attribution tracking, per-entry memory ROI scoring, stale context detection, knowledge gap identification, and tier-level utilization heatmaps.

Medium

2.28

Adaptive Memory Budget AllocationPLANNED

Dynamic per-tier budget allocation based on retrieval effectiveness telemetry. Tiers that provide useful context get more budget automatically.

Medium

2.29

Skill Authoring ToolkitPLANNED

End-to-end skill authoring: interactive scaffolding CLI, testing harness, agent-assisted authoring from natural language, registry publish flow, and dashboard skill editor.

Medium

2.30

Subagent Composition FrameworkPLANNED

Declarative subagent composition from available skills with dependency resolution, capability coverage validation, template library, and hot-reload.

Medium

2.31

Delegation IntelligencePLANNED

Performance-informed delegation scoring, explainable routing decisions, automatic fallback chains, cost attribution, and operator override controls.

Medium

2.35

Semantic Intent ClassificationDONE

Shipped in v0.11.1. SemanticClassifier with centroid-based embedding classification replaces keyword matching. Typed categories, per-bank thresholds, TrustLevel, AbstainPolicy, ClassificationTrace.

Medium

2.36

Application Profiles & App InstallerPARTIAL

Shipped in v0.11.1. ProfileRegistry with named configurations, roboticus apps install/uninstall/list CLI, manifest-driven profile creation with skills/themes/subagents. Theme marketplace planned for v0.11.2.

Medium

2.37

Dashboard Modular ArchitecturePLANNED

Decompose monolithic dashboard_spa.html into CSS partials, JS modules per feature area, and Go-assembled HTML templates. No framework migration — remains server-rendered SPA.

Medium

2.38

Deep Theme IntegrationPLANNED

Decorations manifest: ornamental separators, card borders, background textures, scrollbar styling, animation accents, audio ambience. Workspace theming: bot sprites, station icons, tether lines, particle effects.

High

2.39

Pipeline Decision Flow VisualizationPLANNED

Per-turn flow graph: visual nodes for intent classification, cache decision, planner action, retrieval, inference, guards. Decision annotations, timing overlays, fallback visibility. Backed by shared trace data.

High

Tier 3 — Frontier

Ambitious capabilities that push the architecture into new territory. High effort, high potential.

3.1

Enforced Session Lifecycle SafetyPARTIAL

Partially shipped. Session lifecycle state machine with runtime enforcement — Created/Active/Closed transitions validated at every boundary. Go interfaces enforce valid transitions across the codebase.

High

3.2

MCP IntegrationDONE

Dual-role implementation: McpServerRegistry exposes tools/resources to clients, McpClientManager connects to external MCP servers. export_tools_as_mcp() utility for tool conversion.

High

3.3

Multi-Agent OrchestrationPARTIAL

Partially shipped in v0.7.0+. Subagent role contracts (subagent vs model-proxy), roster semantics, model-assignment modes, and turn-linked forensics are live. Full workflow orchestration patterns remain.

High

3.4

Agent Spawning + Wallet ProvisioningPLANNED

Spawn child agents with provisioned wallets, delegated tasks, and fund reclamation on completion or timeout.

High

3.5

Advanced RAG InfrastructurePARTIAL

Binary embedding storage, HNSW index, content chunking, persistent cache (all done). Remaining: document ingestion pipeline.

High

3.6

Voice ChannelsPLANNED

Promoted to Tier 2 as item 2.20. See 2.20 Voice Channels for current spec.

High

3.7

UniRoute Model VectorsPLANNED

Feature vectors derived from model capabilities, pricing, and benchmarks. Route among unseen models without retraining.

High

3.8

Game-Theoretic Cascade OptimizationPLANNED

Automatically determine optimal cascade strategy per query type — sometimes skipping the weak model is cheaper than trying it first.

Medium

3.9

Storage Backend TraitPARTIAL

Partially shipped in v0.10.0. StorageBackend trait with SqliteBackend implementation. PostgreSQL backend available as opt-in via --features postgres.

High

3.10

Cryptographic Device IdentityPLANNED

Zero-trust device identity and pairing built on existing ECDSA/ECDH infrastructure with encrypted state sync.

High

3.11

Agent Discovery ProtocolPLANNED

DNS-based agent discovery with SRV/TXT records and mDNS fallback for zero-config LAN scenarios.

Medium

3.12

Flexible Network BindingDONE

Interface-based binding (0.0.0.0, 127.0.0.1, or any interface), optional TLS with configurable cert/key paths, advertise URL for reverse proxy and NAT traversal.

Low

3.13

Zero-Trust Global Remote UI AccessPLANNED

Defense-in-depth remote access: mandatory TLS, OIDC/SAML SSO + MFA, device trust, per-route RBAC, IP reputation, CSRF/CORS hardening, signed session rotation.

High

3.14

GibberLink — Agent Voice ProtocolPLANNED

Audio-frequency agent-to-agent communication encoding structured data as modulated audio signals over voice channels (WebRTC, phone, Discord voice).

High