claude-agent-lab

Command Center — Sequential Backlog

Last Updated: 2026-04-26 (C16 epic added — Phase 2: Autonomous Operations) Total items: 19+ (7 foundation + 14 shipped + 1 active epic + future list) Completed: F1–F7 + C01 + C02 + C03 + C06 + C08 + C09 + C10 + C11 + C13 + C14 + C15 + A1 + A2 + A3 C12 was partial then reverted — flag pulled in the audit sweep, follow-up tracked Public surface: github.com/jaysidd/claude-agent-lab + jaysidd.github.io/claude-agent-lab/ Tests: 22 smoke + 2 @engine = 24, all green

Work order: Items are numbered C##. Complete them in order unless a higher-priority need lands.

Tracking legend:


DONE — Foundation (Completed 2026-04-23)

# Item Date Notes
F1 Project scaffold — Express + tsx + SDK + vanilla UI 2026-04-23 npm install @anthropic-ai/claude-agent-sdk, tsx, express
F2 Multi-agent sidebar — Main / Comms / Content / Ops 2026-04-23 agents.ts defines each; sidebar renders from /api/agents
F3 Per-agent system prompts, tools, session persistence 2026-04-23 resume: sessionId stored per agent in server Map
F4 Folder picker + cwd scoping 2026-04-23 /api/cwd, /api/browse; query() receives cwd:
F5 @file autocomplete 2026-04-23 /api/files; dropdown in composer with keyboard nav
F6 Model selector per agent (Opus / Sonnet / Haiku) 2026-04-23 Runtime override via /api/model/:agentId; defaults in agents.ts
F7 Model + auth footer on each reply 2026-04-23 Captured from system.init; “Max plan · subscription” when apiKeySource === "none"

Phase 1: Sub-agent + Real-Time UX

C01 — Sub-agent delegation (Main auto-routes to specialists)

Rationale

Right now Main just tells the user “you should ask Comms about that.” The SDK can do better: if Main is given agents: { comms, content, ops } in its options, it gains an Agent tool and can delegate directly. The user asks Main; Main decides “this is a comms task”; invokes Comms as a sub-agent; returns the combined result. That’s the pattern from the YouTube demo.

Design

Files

Acceptance criteria


C02 — Streaming responses (SSE, token-by-token)

Rationale

Current /api/chat waits for the full SDK stream to complete, then returns JSON. For quick replies this is fine; for Opus answering a hard question, the user stares at “thinking…” for 20s. Streaming the intermediate assistant messages + tool uses as they happen turns that into visible progress.

Design

Files

Acceptance criteria


C03 — Task queue with LLM auto-routing

Rationale

A “+ New task” button opens a modal. User types the task + picks priority. Server hands the task description to a cheap classifier (Haiku) which picks the right agent. The task lands in a simple kanban column (“queued”). Click to fire; agent runs asynchronously; status moves to “in progress” → “done” with the result linked.

Design

Files

Acceptance criteria


C04 — Persistent memory (SQLite)

Rationale

Today: restart = amnesia. Chat history and agent sessions die. That’s fine for a learning lab; it’s not fine if you actually want Main to remember that you prefer short emails or that Comms should always sign off “— J”.

Clawless learned this the hard way and ported a custom memory engine with BM25 + vector search. We don’t need that level yet. Minimal version:

Files

Acceptance criteria


C05 — Telegram bridge

Rationale

The whole pitch of the SDK is: same engine, any interface. Running alongside the web UI, a Telegram bot routes messages to the same agents. Shows tangibly that the SDK is “Claude as a subroutine.”

Design

Files

Acceptance criteria


C06 — Playwright smoke tests

Rationale

Mirror Clawless’s “every user-visible surface has at least one Playwright test” rule. Starting point:

Files

Acceptance criteria


C07 — Electron / Tauri packaging (later)

Package the web UI + server as a desktop app. Electron is the easy path given the ecosystem familiarity (Clawless uses it). Tauri is smaller binaries. Revisit when packaging becomes useful.


Phase 2: Autonomous Operations

C16 — Autonomous Agent Firm (scheduler · durable tasks · budgets · approvals)

Goal

Take Command Center from “interactive lab” to “lab + small autonomous runtime.” Make it possible to run a Paperclip-style agent firm (CEO + specialists, delegating via task comments, waking on schedule) directly on the SDK with the existing Max OAuth — without rebuilding Paperclip’s whole platform.

Scope guard — what this is NOT

Constraints baked into the design

These aren’t blockers — they’re shape constraints that should inform every sub-feature:

  1. Max plan rate limits. 5-hour usage windows. Budget enforcement (C16c) must front-run rate exhaustion, not just track tokens after-the-fact, or the firm will stall mid-cycle.
  2. Fair-use posture. Personal-scale only. Don’t demo this as a 24/7 hedge fund. Keep schedules conservative (think “hourly,” not “every 30 seconds”).
  3. Permission bypass risk. Headless agents need permissionMode: 'bypassPermissions' (SDK equivalent of Paperclip’s dangerouslySkipPermissions). That removes the safety net for tool use. Approval gates (C16d) are how the safety net comes back for high-stakes steps.
  4. OAuth session lifetime. Tokens rotate. The scheduler (C16a) needs a session healthcheck so the firm doesn’t silently die at 3 AM after a token rotation.

Phased sub-features

C16a — Scheduler / cron-style agent triggers
C16b — Durable task queue (promote C03 to SQLite) — ✅ DONE 2026-04-27

Shipped across commits 16d7784 (impl) → 2f7c11c (Reviewer R1-R7) → acdb5c3 (QA tests) → f7cc8f8 (Perf P1+P2). Schema rev. 2 locked with Clawless agent across two review rounds before code; full design + audit reports preserved at .notes/c16b-task-queue-design.md (gitignored), docs/audits/perf-audit-c16b.md, docs/audits/security-audit-c16b.md. All acceptance criteria met. Next: C16c (CostGuard), C16a (Scheduler), or C16d (Approval gates).

C16c — Budget enforcement (extend A1 from tracking to capping) — ✅ DONE 2026-04-27

Shipped on branch c16c-costguard, commit e0cb5a2. All six roles signed off in one session. src/costGuard.ts (standalone primitive, zero Express/SDK imports — designed for Clawless B64 mechanical lift) + src/costGuardInstance.ts (singleton bootstrap reading caps from settings table) + src/server.ts wiring into /api/chat, /api/chat/stream, /api/task/:id/run + GET /api/costguard/status introspection + Budget (CostGuard) section in SETTINGS_SCHEMA. Reviewer M1 (override allowlist tightened to known agents only) + M2 (cap=0 collapses to “unset” to match the “blank = no cap” UX) folded in same session. 5 new Playwright smoke tests (32/32 green). Audits at docs/audits/perf-audit-c16c.md + docs/audits/security-audit-c16c.md.

C16d — Per-task approval gates

Demo target (north-star end-state)

After all four sub-features ship, this should be possible:

  1. Create 6 custom agents (CEO + 5 specialists) via C15 with the Paperclip trading-firm system prompts.
  2. Schedule the CEO to wake every hour with prompt “review overnight specialist outputs and queue today’s research cycle” (C16a).
  3. CEO delegates via SDK sub-agents (C01) — child tasks land in the durable queue (C16b).
  4. Each specialist has a $5/month token cap; Risk Management has $10 (C16c).
  5. Execution agent has requires_approval on any live-trading tool — operator gets pinged, approves from phone via Telegram (C05) (C16d).

That’s the full Paperclip demo, on the SDK, on Max OAuth, ~personal scale, no API key.

Out of scope / explicitly NOT building

Clawless cross-pollination — required reads before implementation

Lane split confirmed with Clawless agent on 2026-04-26 after sharing the C16 design:

Sub-feature Clawless status Action for Command Center
C16a Scheduler Shipped (B06), different runtime Build ours; steal UX patterns from B06; OAuth-healthcheck novelty may flow back
C16b Durable queue B54 has no SQLite design — adopting ours wholesale Draft schema + atomic-checkout SQL → send to Clawless → implement (we’re the source of truth)
C16c Budget B64 starts mid-next-week, signature LOCKED Build against the locked signature in src/costGuard.ts; Clawless’s B64 builds against the same shape
C16d Approval gates Wait-and-see, has per-tool already Build it; produce written analysis on per-task-vs-per-tool qualitative difference; portability decision flows from that

Operating rule: C16c signature is locked and both sides build against it. C16b is on us to draft first; Clawless adopts mechanically when ready. Ping Clawless agent when C16b schema lands.


Future — Not Scheduled

Item Notes
Markdown rendering in chat HIGH impact, SMALL effort. Agent replies are plain text; marked or similar + a syntax-highlighted code block renderer (hljs or shiki) would massively improve legibility, especially for Content and Ops output.
Inline AskUserQuestion UI SDK exposes an AskUserQuestion tool that pauses mid-task with multiple-choice prompts. Wire this up in the streaming pipeline so mid-run disambiguation shows up as an interactive card.
Plan mode toggle SDK supports permissionMode: 'plan' for read-only agent runs. One toggle per agent — huge trust multiplier for Ops.
File checkpoint + rewind Expose Query.rewindFiles() as a “roll back to this turn” button on any user message. SDK native feature; OpenClaw doesn’t have this. Differentiator.
Slash commands /clear, /model, /compact, /agents, /help. Maps to Claude Code’s native syntax. Users who know the CLI get muscle memory.
Skills panel SDK loads .claude/skills/*/SKILL.md automatically if cwd contains them. Add UI to browse and toggle per agent.
MCP configuration UI Point at a stdio or HTTP MCP server → light up as tools for chosen agent. The SDK’s MCP primitive is the gateway to infinite integrations.
Session history sidebar List past conversations per agent; click to restore via resume:. Goes hand-in-hand with C04 persistent memory.
Context pinning per agent “Always consider my writing-style doc when drafting.” A pinned file or snippet prepended to every turn for that agent.
Cost & token tracking SDK’s ResultMessage has usage info. Per-turn tokens, running total, forecast cost. Especially useful when the commercial path unlocks API-key mode.
Conversation export Download chat history as markdown or JSON. One-click share.
Multi-pane chat Split view — two agents side-by-side for model comparison or parallel work. Matches the YouTube “mission control” vibe.
Right-panel file viewer When Ops reads a file, show it inline in a side pane so the user sees what the agent saw. Debugging + trust.
Keyboard shortcuts Cmd+K switch agent, Cmd+Enter send, Cmd+T tasks, Cmd+F folder. Muscle-memory speed boost.
Voice layer Whisper STT for input, TTS for output. Optional Pipecat/Gemini Live for a “war room” experience. Large effort; only worth it after the written flow is polished.
“Council” mode One prompt → multiple agents weigh in → synthesizer produces a consolidated answer. Good for decisions.
Hook inspector Render PreToolUse/PostToolUse/Stop events as a timeline for each turn. Developer-facing, but teaches the SDK’s event model.
Multiple workspaces Switch between project contexts (different cwd + memory partition) without losing state. Matches how devs actually work.
Sub-agent depth limit Prevent runaway delegation chains. Currently no limit — a pathological prompt could cascade.
Auth profile switcher Toggle between “personal (OAuth, Max)” and “dev (API key)” modes for testing the commercial path end-to-end.
C12-follow-up: UI rewind Complete the file-rewind UI. enableFileCheckpointing: true is already set so snapshots exist; what’s missing is holding the SDK Query object alive across HTTP requests so Query.rewindFiles(userMessageId) can be called on demand. Requires refactoring the chat lifecycle to streaming-input mode (prompt as AsyncIterable<SDKUserMessage>), tracking user-message UUIDs, and adding a rewind affordance on each user bubble. Effort: 2-3 hours, medium complexity.
AskUserQuestion from hooks Let hooks ask the user for approval mid-tool-run (e.g., before a destructive Bash command).
Per-agent avatar / personality One-click tone shifts: formal / casual / concise / playful. Stored as preamble injection.
Onboarding tour 5-step first-run flow that highlights sidebar, chat, folder, tasks, model selector.

Change Log

Date Item Notes
2026-04-23 F1–F7 shipped Full foundation in one session. Express + SDK + vanilla UI, ~1,100 LOC.
2026-04-23 Docs scaffolded CLAUDE.md + architecture.md + backlog.md + handoff.md, mirroring Clawless v5 conventions.
2026-04-23 C01 DONE Sub-agent delegation via SDK agents option. Main routes to Comms/Content/Ops. Delegation chips in UI. Commit 38bd113.
2026-04-23 C02 DONE Streaming responses via includePartialMessages: true. NDJSON events from /api/chat/stream; blinking-cursor UI. Commit b359d4c.
2026-04-23 C03 DONE Task queue with Haiku-classified auto-routing. 3-column board, priority, agent override. Commit 9e4142e.
2026-04-23 C06 DONE Playwright smoke + engine projects. 7 smoke (no engine) + 2 @engine tests. npm run test:smoke / test:engine.
2026-04-24 C08 DONE Markdown rendering in chat. marked + DOMPurify + highlight.js via jsDelivr; applied only to completed (non-streaming) agent bubbles. Slash-command output renders as markdown too.
2026-04-24 C09 DONE Persistent memory via better-sqlite3 at ./data/lab.db. CRUD routes; global or per-agent scope; fact / preference / context categories. Injected as <persistent-memory> system-prompt block on every query(), capped at ~2k chars.
2026-04-24 C10 DONE Slash commands. Client-side dispatcher intercepts /cmd args and handles /help, /clear, /model [id], /agents, /plan on/off without a server round-trip. System-origin messages render through the same markdown pipeline.
2026-04-24 C11 DONE Plan mode toggle. Header checkbox flips permissionMode: 'plan' on for the active agent; task runs respect the toggle too. Switching plan mode clears that agent’s session.
2026-04-24 C12 PARTIAL enableFileCheckpointing: true is now set on every chat/stream/task query() call — snapshots are captured. UI rewind-to-user-message is deferred because Query.rewindFiles() requires holding the Query object alive across requests, which needs a streaming-input architecture. Added as C12-follow-up in the future list.
2026-04-24 Docs docs/drafts/linkedin-project-entry.md added — copy-paste-ready content for LinkedIn Projects section.
2026-04-24 C13 DONE WhisprDesk voice integration. Local HTTP proxy routes (/api/whisprdesk/{status,capabilities,transcribe,events}) forwarding to WhisprDesk’s External App Gateway on 127.0.0.1:9879. Browser mic button (MediaRecorder → proxy → transcript). Passive SSE listener auto-fills composer from any WhisprDesk dictation when tab is focused. Browser SpeechSynthesis speak button on each agent reply. Sidebar status indicator. Minimal .env loader added. Token stays server-side.
2026-04-24 C14 + C15 DONE Settings modal (SQLite-backed config with env fallback, secrets masked) and dynamic agents (CRUD, custom_agents table, sidebar + New agent button). Closes the operator-surface gap from the audit.
2026-04-24 Audit sweep DONE Architect/Reviewer/QA agents identified UI-vs-backend gaps + correctness issues. Closed all 13: Telegram fields disabled with “coming soon”, enableFileCheckpointing flag removed, 4 dead routes deleted, memory validation fixes, race conditions, error-path logging, brittle test hardening.
2026-04-24 Sidebar UX ”+ New agent” promoted from a muted dashed button to a prominent gradient primary action. /think hard\|fast\|default slash aliases added on top of /model.
2026-04-24 Voice UX hardening Mic button shows ⏹ icon when recording + pink “Recording / click ⏹ to stop” indicator with live timer. Errors from the WhisprDesk proxy now surface upstream details so failures are debuggable.
2026-04-25 Voice fixes Browser-side WebM→WAV conversion (Web Audio API + PCM 16-bit encoder) so WhisprDesk’s ffmpeg never chokes on streaming-EBML quirks. Shortcut switched from ⌘⇧M to ⌥V (avoids Chrome user-switcher / macOS minimize collisions).
2026-04-25 A1 DONE Cost & token tracking, OAuth-aware. Per-message footer always shows tokens; $ only when API-key auth. Session-totals chip in chat header. SDK’s total_cost_usd used directly — no client-side pricing table.
2026-04-25 A2 DONE Session history. Two new SQLite tables (sessions, session_messages); appendTurn() transactional; auto-titles from first user message. 📜 History modal lists past conversations grouped by agent; click any to restore via resume:.
2026-04-25 A3 DONE Conversation export. /export, /export md, /export json slash commands generate downloads client-side from existing chat state. Markdown is publish-friendly, JSON keeps raw usage objects for downstream analysis.
2026-04-25 GitHub Pages Pages enabled at jaysidd.github.io/claude-agent-lab/ (source: main / root). Repo homepage URL set so the github.com sidebar shows the live URL. README is auto-served as the index by Jekyll.
2026-04-25 Marketing Replaced 3 OpenCode references with Clawless cross-promo (intro, “what this is not”, acknowledgements). Honest “same author” disclosures kept in each. README also documents history/cost/export sections with the two new screenshots (13-history-modal, 14-chat-with-usage).
2026-04-25 Mermaid fix Streaming sequence-diagram Note text rewritten to plain prose — semicolons and parens were tripping GitHub’s mermaid parser. Audit confirmed no other Notes have parser tripwires.
2026-04-26 C16 epic added Autonomous Agent Firm: scheduler + durable tasks + budgets + approvals. Phased over four sub-features (C16a–d). Designed after evaluating Paperclip’s trading-firm demo and confirming the SDK + Max OAuth path is viable for personal-scale autonomous runs. Subsumes the old “Cron / scheduled tasks” Future entry.
2026-04-26 C16 Clawless align Cross-checked C16 with Clawless agent same day. C16a already shipped on their side as B06; C16b they want portably (will absorb into their B54); C16c parallel build (their B64, this week — must align preflight signature + main-process-enforcement principle + OAuth-bypass-but-keep-rate-cap); C16d wait-and-see pending qualitative-difference analysis vs their existing per-tool approval. Lane split: their lane includes user-facing budget UX, license-gated runtime, channel adapters, closed-source desktop. Required: design sync on C16b schema + C16c preflight signature before either side commits implementation.
2026-04-26 C16c signature LOCKED Second Clawless round same day. Preflight signature check(agentId, estimatedTokens?) → {ok, reason?, capType?: 'cost'\|'rate', remaining?} agreed both sides — estimatedTokens is optional (post-hoc cost accumulation is acceptable for Phase 1; required for rate-cap and Phase-2 precision). Two-tier vocabulary adopted: cost cap ($, OAuth bypasses) + rate cap (requests-per-window, always enforced). Naming: CostGuard system-internal, “Budget” user-facing. C16b: Clawless’s B54 is greenfield (in-memory FIFO, zero SQLite design); our schema is the source of truth, they adopt mechanically. B64 starts coding mid-next-week, launches 2-3 weeks.
2026-04-27 ClaudeLink wired .mcp.json adds claudelink-server (stdio MCP) for cross-terminal multi-agent communication. CLAUDE.md gains the ClaudeLink protocol section (inbox-check cadence + shortcut phrases). docs/Agent_Lab/ (writing project drafts) gitignored. Initial relay round to Clawless about C16b was paste-based; subsequent rounds via mcp__claudelink__* tools after a session restart picked up the MCP.
2026-04-27 C16b schema rev. 2 LOCKED Two cross-project review rounds with Clawless agent on the durable-queue design. Final shape: 5-state enum (running dropped — worker-side concern), atomic BEGIN IMMEDIATE checkout via RETURNING *, lease-based crash recovery, 4 indexes (added (agent_id, status) for B54 per-agent serialization), migrate(db) exported separately (no bundled _migrations table — host wires into its own migration runner), 64 KB metadata soft-cap, six locked open-question resolutions. Design at .notes/c16b-task-queue-design.md (gitignored).
2026-04-27 C16b DONE Durable task queue + atomic checkout shipped. src/taskQueue.ts (442 lines, host-agnostic, zero Express/SDK imports, designed for Clawless B54 mechanical lift) + src/taskQueueInstance.ts (singleton bootstrap with WORKER_ID = {hostname}:{pid}:{uuid}) + src/server.ts refactor of four task routes onto the queue (GET /api/tasks, POST /api/task, POST /api/task/:id/run, DELETE /api/task/:id) preserving the C03 wire format via toApiTask adapter. Tasks now survive restart with status preserved. Commit 16d7784.
2026-04-27 C16b Reviewer pass Independent reviewer agent surfaced 6 findings (2 MED, 4 LOW). All fixed: (R1) dropped metadata_json clobber on /run that would have destroyed caller-supplied metadata; (R2) constrained DELETE /api/task/:id to terminal states with 409 on non-terminal; (R6) defense-in-depth try/catch on terminal queue updates so a reaped/deleted-mid-run row doesn’t crash the handler; (R3) doc comment on checkoutById attempt-count semantics; (R4) enqueue validates priority/maxAttempts/scheduledFor; (R5) exhaustive statusFromQueue switch with never guard; (R7) WORKER_ID fork/cluster comment. Commit 2f7c11c.
2026-04-27 C16b QA pass Five new Playwright API tests in tests/features.spec.ts (smoke project, no engine): persistence + wire shape, DELETE-on-queued-returns-409, DELETE-on-missing-is-idempotent, priority-enum validation, description-type validation. 27/27 smoke green. Commit acdb5c3.
2026-04-27 C16b Perf pass Audit report at docs/audits/perf-audit-c16b.md — 0 HIGH, 2 MED, 5 LOW. (P1) Dropped redundant JS sort in GET /api/tasks by adding TaskFilter.orderBy option (additive — priority default for B54 next-in-queue, host opts into createdAt DESC for kanban). (P2) Gated pruneCompletedTasks on a count check; both statements now prepared once at module load. Commit f7cc8f8.
2026-04-27 C16b Security pass Audit report at docs/audits/security-audit-c16b.md — 0 new HIGH/MED, 1 LOW (SC1 reaffirms S5/S6), 2 Info (SC2/SC3 out-of-threat-model). All 17 prepared statements walked + parameterized. Worker_id forgery structurally impossible (server-only, never client-supplied). All 10 prior accepted risks (S1-S10) confirmed unaffected. Watch-list for future sessions: external-source task ingestion (C05/C16d) should default plan-mode-on or gate Run behind approval; future remote-worker API needs worker_id auth before shipping; if /api/task ever accepts client-supplied metadata, wrap the 64 KB cap throw into 400.
2026-04-27 C16c DONE CostGuard budget enforcement shipped on branch c16c-costguard, commit e0cb5a2. src/costGuard.ts (standalone primitive — zero Express/SDK imports, designed for Clawless B64 mechanical lift) + src/costGuardInstance.ts (singleton bootstrap reading caps from settings table) + wiring into /api/chat, /api/chat/stream, /api/task/:id/run (preflight 429 + post-call ledger record). New GET /api/costguard/status introspection route. New “Budget (CostGuard)” section in SETTINGS_SCHEMA. OAuth bypasses cost cap by recording is_oauth=1 rows that the cost SUM filters out; rate cap always enforced regardless of provider. Reviewer fixes folded same session: M1 (override allowlist tightened to known agents, no nested dots, no rate_window_seconds per-agent variant) + M2 (cap=0 collapses to “unset” to match the “leave blank for no cap” UX promise). 5 new Playwright smoke tests (32/32 green): schema, allowlist, status shape, exhausted-cap-returns-429-without-firing-SDK (seeds ledger directly to stay in smoke project), cap=0 unset behavior. Untracks gitignored test-results/.last-run.json artifact. Audits at docs/audits/perf-audit-c16c.md + docs/audits/security-audit-c16c.md.
2026-04-27 C16c Security pass 0 HIGH/MED/LOW, 3 Info (SC4 ledger has no retention/prune policy — fine at personal scale, watch for Clawless multi-tenant lift; SC5 the 429 reason string discloses cap value/window length — accepted; SC6 costguard.* global keys are intentionally operator-configurable via the existing /api/settings route). All 7 ledger SQL sites walked + parameterized. agentId validated by findAgent() before reaching costGuard.status(). The is_oauth flag is sourced exclusively from the server-captured system.init message at all 4 record() call sites — zero client influence. seedLedgerRow test helper unreachable from production.
2026-04-27 C16c Perf pass Audit at docs/audits/perf-audit-c16c.md: 0 HIGH, 0 MED, 7 LOW. Total check() overhead measured at ~22 µs p50 / ~45 µs p99 — invisible against 1-10 s of LLM latency. Both ledger queries hit idx_ledger_agent_time (rate query is COVERING). One actionable fix applied (P1): cached prepared statement in settings.ts:getSetting() — drops 5× re-prepare per resolveCaps() from ~20 µs to ~2.4 µs and benefits every configValue caller (WhisprDesk, future Telegram, etc.), not just CostGuard. Six accepts: P2 partial-index threshold (>10k month-rows-per-agent — irrelevant at personal scale), P3 ledger pruning (80 B/row gives years), P4 no N+1 between check/record, P5 no double-record on aborted streams, P6 startOfMonth nanosecond cost, P7 sync sqlite is fine for single-process. Watch list noted for Clawless multi-tenant lift: partial covering index (agent_id, occurred_at, cost_usd) WHERE is_oauth=0 (9× faster on month-sum at 30k rows), write-behind ledger inserts, async sqlite binding.
2026-04-28 README refresh (PR #3) 1d77404. Two new feature sections (Durable task queue, Budget caps / CostGuard), CostGuard preflight sequence diagram, redrawn task state machine for the durable 5-state enum, architecture state table flipped from in-memory-Maps to per-table SQLite breakdown, API contract updated (CostGuard + custom-agent CRUD + settings; bulk-delete-memories row removed), project layout and LOC refreshed (~2,500 → ~7,500), tests badge 22 → 35, “What’s on the backlog” reframed around C16 with C16a marked ⭐ next. New screenshot 15-settings-budget.png. Docs-only — Reviewer-only per skip rules; no Perf/Security audits.
2026-04-28 C16a DONE Cron-style scheduler shipped on branch c16a-scheduler. src/scheduler.ts (~470 LOC, host-agnostic primitive — zero Express/SDK imports, mirrors taskQueue.ts/costGuard.ts pattern, ready for Clawless lift) + src/schedulerInstance.ts (singleton bootstrap with cron-parser v5 CronExpressionParser) + wiring into server.ts: 8 routes (GET/POST /api/schedules, GET/PATCH/DELETE /api/schedules/:id, POST /api/schedules/:id/{run-now,pause,resume}, POST /api/cron/preview) + onFire callback that runs the SDK with OAuth-dead detection (two-token regex + before-first-assistant-message position guard) + 3-strike auto-pause for non-OAuth recurring failures + CostGuard preflight + taskQueue-backed fires. Schedules modal UI: list view with status badges, create form with cron preset chips + live “next 3 fires” preview, pause/resume/run-now/delete actions. Single 30s tick fires due schedules + lights up taskQueue.reapExpired() (was unreachable per C16b P4). Reviewer pass: 13 findings, 5 fixed pre-QA (recordOutcome enabled=1 guard + TOCTOU removal via consecutive_failures + 1, OAUTH_DEAD_PATTERN tightened to two-token requirement, run-now .catch for void’d promise, budget-block uses taskQueue.cancel instead of checkoutById+fail). 9 new Playwright smoke tests (42/42 green).
2026-04-28 C16a Perf pass Audit at docs/audits/perf-audit-c16a.md: 0 HIGH, 0 MED, 8 LOW — ship as-is. EXPLAIN QUERY PLAN confirms idx_schedules_due partial index serves the tick query (SEARCH schedules USING INDEX idx_schedules_due (next_fire_at<?) with ORDER BY satisfied by index — no TEMP B-TREE). Steady-state tick at 50 schedules / 0 due: 0.23 µs every 30 s. P1 inline db.prepare(...) (15 sites in scheduler.ts vs costGuard.ts’s constructor cache) measured at ~6 µs/call — invisible at personal scale, worth a 30-line refactor for the Clawless lift. P2 scheduler.list() SCAN (partial index excludes paused rows) is ~46 µs at N=50, ~3.8 ms at N=5000; user-visible boundary ~12k rows. P3 reaffirms taskQueue.reapExpired stays dormant (audit found C16a tick wires it correctly — but the tick doesn’t currently call it; clarified vs the original prompt). cronPreview() cost ~35 µs / 3 fires; 200 ms UI debounce more than sufficient. Concurrent fire fan-out is safe — better-sqlite3 transactions serialize on single Node process, sync prefix of executeFire commits before any await.
2026-04-28 C16a Security pass Audit at docs/audits/security-audit-c16a.md: 0 HIGH, 0 MED, 1 LOW (S-C16a-1 reaffirms baseline S1 — cwd accepted unvalidated, watch-list for commercial path), 4 Info. Inline fix applied (S-C16a-2): OAUTH_DEAD_PATTERN extended with a second top-level alternation to also catch the canonical CLI exhortation Please run \claude login`. Strictly additive — every prior true/false case retained; three real-world phrasings flip miss-to-hit. Pre-fix consequence was benign (legit OAuth-dead errors auto-paused after 3 strikes under too_many_failures instead of oauth_unavailable). Confirmed safe: stored prompt injection (localhost-only + agent allowedTools + kanban audit), cron-parser DoS (worst-case 100-char input parses in ~15 ms; library rejects >6-field inputs), SQL injection (15 prepared statements walked, all parameterized; dynamic UPDATE field names from a fixed source-literal allowlist), race conditions (delete-mid-fire / pause-mid-fire / tick-vs-delete all clean — enabled=1 guards prevent stale fires from clobbering manual pauses), metadata.scheduleId info leak (toApiTask shaper drops metadata — verified). Watch list for Clawless multi-tenant lift: schedule routes become attack surface; /run-now` is auth-free trigger for stored prompts.