Last Updated: 2026-04-26 (C16 epic added — Phase 2: Autonomous Operations) Total items: 19+ (7 foundation + 14 shipped + 1 active epic + future list) Completed: F1–F7 + C01 + C02 + C03 + C06 + C08 + C09 + C10 + C11 + C13 + C14 + C15 + A1 + A2 + A3 C12 was partial then reverted — flag pulled in the audit sweep, follow-up tracked Public surface: github.com/jaysidd/claude-agent-lab + jaysidd.github.io/claude-agent-lab/ Tests: 22 smoke + 2 @engine = 24, all green
Work order: Items are numbered C##. Complete them in order unless a higher-priority need lands.
Tracking legend:
- Not started
- [~] In progress
- Complete
- [CHANGED] Plan modified (reason documented)
- [DROPPED] Removed (reason documented)
| # | Item | Date | Notes |
|---|---|---|---|
| F1 | Project scaffold — Express + tsx + SDK + vanilla UI | 2026-04-23 | npm install @anthropic-ai/claude-agent-sdk, tsx, express |
| F2 | Multi-agent sidebar — Main / Comms / Content / Ops | 2026-04-23 | agents.ts defines each; sidebar renders from /api/agents |
| F3 | Per-agent system prompts, tools, session persistence | 2026-04-23 | resume: sessionId stored per agent in server Map |
| F4 | Folder picker + cwd scoping | 2026-04-23 | /api/cwd, /api/browse; query() receives cwd: |
| F5 | @file autocomplete |
2026-04-23 | /api/files; dropdown in composer with keyboard nav |
| F6 | Model selector per agent (Opus / Sonnet / Haiku) | 2026-04-23 | Runtime override via /api/model/:agentId; defaults in agents.ts |
| F7 | Model + auth footer on each reply | 2026-04-23 | Captured from system.init; “Max plan · subscription” when apiKeySource === "none" |
agents: option — no custom routing code.Right now Main just tells the user “you should ask Comms about that.” The SDK can do better: if Main is given agents: { comms, content, ops } in its options, it gains an Agent tool and can delegate directly. The user asks Main; Main decides “this is a comms task”; invokes Comms as a sub-agent; returns the combined result. That’s the pattern from the YouTube demo.
agents map on every query() call when the active agent is Main (or any agent flagged as a “router”)Agent in Main’s allowedToolsAGENTSAgent tool uses, render a “delegated to X” trace chipsrc/agents.ts — add isRouter: boolean field; Main = true, others = falsesrc/server.ts — when agent.isRouter, populate options.agents with the other agents’ definitionspublic/app.js — render delegation trace inline with tool chipsCurrent /api/chat waits for the full SDK stream to complete, then returns JSON. For quick replies this is fine; for Opus answering a hard question, the user stares at “thinking…” for 20s. Streaming the intermediate assistant messages + tool uses as they happen turns that into visible progress.
POST /api/chat/stream returning Server-Sent Eventstype: init (session_id, model, apiKeySource), assistant_delta (text chunk), tool_use (tool call), result (final text), errorfetch for EventSource; append deltas to a growing message bubble; close connection on result or error/api/chat around as the non-streaming fallback (used by tests)src/server.ts — add streaming route; extract SDK message-type normalization into a helperpublic/app.js — sendMessage() becomes sendMessageStreaming(); render partials incrementallyA “+ New task” button opens a modal. User types the task + picks priority. Server hands the task description to a cheap classifier (Haiku) which picks the right agent. The task lands in a simple kanban column (“queued”). Click to fire; agent runs asynchronously; status moves to “in progress” → “done” with the result linked.
POST /api/task → {description, priority} → Haiku classifies agent → returns {taskId, assignedAgent, description}Map<taskId, Task>/api/chat/stream with the chosen agent and task description as the promptsrc/server.ts — /api/task, /api/tasks GET, /api/task/:id/start POSTsrc/taskRouter.ts (new) — Haiku prompt for classificationpublic/app.js — task panel + modalToday: restart = amnesia. Chat history and agent sessions die. That’s fine for a learning lab; it’s not fine if you actually want Main to remember that you prefer short emails or that Comms should always sign off “— J”.
Clawless learned this the hard way and ported a custom memory engine with BM25 + vector search. We don’t need that level yet. Minimal version:
~/.claude-agent-lab.dbmemories table: id, content, agent (nullable for global), category (fact/preference/context), created_atquery() callsrc/db.ts (new) — better-sqlite3 wrapper, schema initsrc/memory.ts (new) — CRUD + injection helperssrc/server.ts — wire memory retrieval into /api/chat[/stream]public/memory.html + CSS + JS (or a modal) — view/add/delete memoriesThe whole pitch of the SDK is: same engine, any interface. Running alongside the web UI, a Telegram bot routes messages to the same agents. Shows tangibly that the SDK is “Claude as a subroutine.”
src/channels/telegram.ts — long-poll bot using node-telegram-bot-api/<agent> commands switchsrc/channels/telegram.ts (new)src/server.ts — spawn the Telegram listener on startup if TELEGRAM_BOT_TOKEN env set.env.example — document the two env vars needed/ops switches to Ops agentMirror Clawless’s “every user-visible surface has at least one Playwright test” rule. Starting point:
@engine)@ shows file list from cwdplaywright.config.tstests/smoke.spec.ts — page load + agents visible (no engine needed)tests/chat.spec.ts — @engine tests that hit SDKpackage.json scripts — test, test:smoke, test:enginenpm run test:smoke passes offline (no SDK calls)npm run test:engine passes against a running server with Max OAuth@filePackage the web UI + server as a desktop app. Electron is the easy path given the ecosystem familiarity (Clawless uses it). Tauri is smaller binaries. Revisit when packaging becomes useful.
memory/project_paperclip_comparison.md (positioning vs Paperclip), memory/project_clawless_c16_alignment.md (Clawless lane split + cross-pollination decisions).Take Command Center from “interactive lab” to “lab + small autonomous runtime.” Make it possible to run a Paperclip-style agent firm (CEO + specialists, delegating via task comments, waking on schedule) directly on the SDK with the existing Max OAuth — without rebuilding Paperclip’s whole platform.
These aren’t blockers — they’re shape constraints that should inform every sub-feature:
permissionMode: 'bypassPermissions' (SDK equivalent of Paperclip’s dangerouslySkipPermissions). That removes the safety net for tool use. Approval gates (C16d) are how the safety net comes back for high-stakes steps./api/schedules.query() call. The novelty is durability + timing, not the agent loop itself.src/scheduler.ts (new — uses node-cron or a simple setInterval loop with persisted next-fire-at timestamps), src/server.ts (new routes), data/lab.db (schedules table), public/schedules.html or modal.Shipped across commits
16d7784(impl) →2f7c11c(Reviewer R1-R7) →acdb5c3(QA tests) →f7cc8f8(Perf P1+P2). Schema rev. 2 locked with Clawless agent across two review rounds before code; full design + audit reports preserved at.notes/c16b-task-queue-design.md(gitignored),docs/audits/perf-audit-c16b.md,docs/audits/security-audit-c16b.md. All acceptance criteria met. Next: C16c (CostGuard), C16a (Scheduler), or C16d (Approval gates).
Map<taskId, Task> moves to SQLite at data/lab.db. Tasks survive restart. Add atomic checkout semantics so a scheduled fire can’t grab a task another worker already started.src/db.ts (tasks table + transactions), src/taskQueue.ts (new — extract the atomic-checkout primitive into a standalone module from day one, not later, so the lift is mechanical), src/server.ts (refactor task routes), public/app.js (no UI change — same kanban, persistent backing).taskQueue.ts has zero Command-Center-specific imports (no Express, no SDK references) so it’s a pure data-layer module.Shipped on branch
c16c-costguard, commite0cb5a2. All six roles signed off in one session.src/costGuard.ts(standalone primitive, zero Express/SDK imports — designed for Clawless B64 mechanical lift) +src/costGuardInstance.ts(singleton bootstrap reading caps from settings table) +src/server.tswiring into/api/chat,/api/chat/stream,/api/task/:id/run+GET /api/costguard/statusintrospection +Budget (CostGuard)section in SETTINGS_SCHEMA. Reviewer M1 (override allowlist tightened to known agents only) + M2 (cap=0 collapses to “unset” to match the “blank = no cap” UX) folded in same session. 5 new Playwright smoke tests (32/32 green). Audits atdocs/audits/perf-audit-c16c.md+docs/audits/security-audit-c16c.md.
/api/chat[/stream] and scheduled fires return a structured “budget exhausted” response before the SDK call.src/server.ts, never public/app.js. Mirror the principle even though we don’t have a renderer/skill split.check(
agentId: string,
estimatedTokens?: number // optional — omit for post-hoc cost accumulation; pass for precise rate-cap and Phase-2 cost-cap predictions
): {
ok: boolean;
reason?: string; // human-readable rejection reason when ok === false
capType?: 'cost' | 'rate'; // which cap tripped
remaining?: number; // dollars for cost, requests for rate
}
CostGuard is the system-internal / agent-to-agent name. User-facing UI keeps “Budget” as the Settings entry-point label.src/costGuard.ts (new — preflight primitive matching the locked signature; standalone, no Express/SDK imports so it lifts cleanly to Clawless’s B64), src/server.ts (call costGuard.check() before every query()), public/settings.html (Budget tab — surfaces both cost cap and rate cap per agent).costGuard.check() matches the locked signature exactly · enforcement is server-side only · src/costGuard.ts has zero Express/SDK imports.requires_approval: true. When the agent reaches a configured “stop point” (mid-task, before a Bash command, before file write — TBD scope), it pauses, posts a comment, and waits. Operator approves/rejects from the kanban; agent resumes or aborts.bypassPermissions removes. Without this, autonomous trading (or any destructive action) is one prompt-injection away from disaster.src/server.ts (approval endpoints, hook integration), src/db.ts (approvals table), public/app.js (approve/reject UI on task cards).PreToolUse hooks — register a hook that intercepts dangerous tools and parks the run on an approval queue. Plan mode (C11) is the per-turn cousin; this is the per-task version.requires_approval: true pauses and waits · approve from UI → agent resumes from the same point · reject → agent aborts cleanly with comment · approval state persists across restart · Bash/Write/Edit on production-marked cwd auto-trigger approval regardless of task setting (defense in depth) · written analysis (1 page) on whether per-task is qualitatively different from per-tool approval, with concrete examples — Clawless port decision flows from this.After all four sub-features ship, this should be possible:
requires_approval on any live-trading tool — operator gets pinged, approves from phone via Telegram (C05) (C16d).That’s the full Paperclip demo, on the SDK, on Max OAuth, ~personal scale, no API key.
cwd per agent is enough at this scale.Lane split confirmed with Clawless agent on 2026-04-26 after sharing the C16 design:
| Sub-feature | Clawless status | Action for Command Center |
|---|---|---|
| C16a Scheduler | Shipped (B06), different runtime | Build ours; steal UX patterns from B06; OAuth-healthcheck novelty may flow back |
| C16b Durable queue | B54 has no SQLite design — adopting ours wholesale | Draft schema + atomic-checkout SQL → send to Clawless → implement (we’re the source of truth) |
| C16c Budget | B64 starts mid-next-week, signature LOCKED | Build against the locked signature in src/costGuard.ts; Clawless’s B64 builds against the same shape |
| C16d Approval gates | Wait-and-see, has per-tool already | Build it; produce written analysis on per-task-vs-per-tool qualitative difference; portability decision flows from that |
Operating rule: C16c signature is locked and both sides build against it. C16b is on us to draft first; Clawless adopts mechanically when ready. Ping Clawless agent when C16b schema lands.
| Item | Notes |
|---|---|
| Markdown rendering in chat | HIGH impact, SMALL effort. Agent replies are plain text; marked or similar + a syntax-highlighted code block renderer (hljs or shiki) would massively improve legibility, especially for Content and Ops output. |
| Inline AskUserQuestion UI | SDK exposes an AskUserQuestion tool that pauses mid-task with multiple-choice prompts. Wire this up in the streaming pipeline so mid-run disambiguation shows up as an interactive card. |
| Plan mode toggle | SDK supports permissionMode: 'plan' for read-only agent runs. One toggle per agent — huge trust multiplier for Ops. |
| File checkpoint + rewind | Expose Query.rewindFiles() as a “roll back to this turn” button on any user message. SDK native feature; OpenClaw doesn’t have this. Differentiator. |
| Slash commands | /clear, /model, /compact, /agents, /help. Maps to Claude Code’s native syntax. Users who know the CLI get muscle memory. |
| Skills panel | SDK loads .claude/skills/*/SKILL.md automatically if cwd contains them. Add UI to browse and toggle per agent. |
| MCP configuration UI | Point at a stdio or HTTP MCP server → light up as tools for chosen agent. The SDK’s MCP primitive is the gateway to infinite integrations. |
| Session history sidebar | List past conversations per agent; click to restore via resume:. Goes hand-in-hand with C04 persistent memory. |
| Context pinning per agent | “Always consider my writing-style doc when drafting.” A pinned file or snippet prepended to every turn for that agent. |
| Cost & token tracking | SDK’s ResultMessage has usage info. Per-turn tokens, running total, forecast cost. Especially useful when the commercial path unlocks API-key mode. |
| Conversation export | Download chat history as markdown or JSON. One-click share. |
| Multi-pane chat | Split view — two agents side-by-side for model comparison or parallel work. Matches the YouTube “mission control” vibe. |
| Right-panel file viewer | When Ops reads a file, show it inline in a side pane so the user sees what the agent saw. Debugging + trust. |
| Keyboard shortcuts | Cmd+K switch agent, Cmd+Enter send, Cmd+T tasks, Cmd+F folder. Muscle-memory speed boost. |
| Voice layer | Whisper STT for input, TTS for output. Optional Pipecat/Gemini Live for a “war room” experience. Large effort; only worth it after the written flow is polished. |
| “Council” mode | One prompt → multiple agents weigh in → synthesizer produces a consolidated answer. Good for decisions. |
| Hook inspector | Render PreToolUse/PostToolUse/Stop events as a timeline for each turn. Developer-facing, but teaches the SDK’s event model. |
| Multiple workspaces | Switch between project contexts (different cwd + memory partition) without losing state. Matches how devs actually work. |
| Sub-agent depth limit | Prevent runaway delegation chains. Currently no limit — a pathological prompt could cascade. |
| Auth profile switcher | Toggle between “personal (OAuth, Max)” and “dev (API key)” modes for testing the commercial path end-to-end. |
| C12-follow-up: UI rewind | Complete the file-rewind UI. enableFileCheckpointing: true is already set so snapshots exist; what’s missing is holding the SDK Query object alive across HTTP requests so Query.rewindFiles(userMessageId) can be called on demand. Requires refactoring the chat lifecycle to streaming-input mode (prompt as AsyncIterable<SDKUserMessage>), tracking user-message UUIDs, and adding a rewind affordance on each user bubble. Effort: 2-3 hours, medium complexity. |
| AskUserQuestion from hooks | Let hooks ask the user for approval mid-tool-run (e.g., before a destructive Bash command). |
| Per-agent avatar / personality | One-click tone shifts: formal / casual / concise / playful. Stored as preamble injection. |
| Onboarding tour | 5-step first-run flow that highlights sidebar, chat, folder, tasks, model selector. |
| Date | Item | Notes |
|---|---|---|
| 2026-04-23 | F1–F7 shipped | Full foundation in one session. Express + SDK + vanilla UI, ~1,100 LOC. |
| 2026-04-23 | Docs scaffolded | CLAUDE.md + architecture.md + backlog.md + handoff.md, mirroring Clawless v5 conventions. |
| 2026-04-23 | C01 DONE | Sub-agent delegation via SDK agents option. Main routes to Comms/Content/Ops. Delegation chips in UI. Commit 38bd113. |
| 2026-04-23 | C02 DONE | Streaming responses via includePartialMessages: true. NDJSON events from /api/chat/stream; blinking-cursor UI. Commit b359d4c. |
| 2026-04-23 | C03 DONE | Task queue with Haiku-classified auto-routing. 3-column board, priority, agent override. Commit 9e4142e. |
| 2026-04-23 | C06 DONE | Playwright smoke + engine projects. 7 smoke (no engine) + 2 @engine tests. npm run test:smoke / test:engine. |
| 2026-04-24 | C08 DONE | Markdown rendering in chat. marked + DOMPurify + highlight.js via jsDelivr; applied only to completed (non-streaming) agent bubbles. Slash-command output renders as markdown too. |
| 2026-04-24 | C09 DONE | Persistent memory via better-sqlite3 at ./data/lab.db. CRUD routes; global or per-agent scope; fact / preference / context categories. Injected as <persistent-memory> system-prompt block on every query(), capped at ~2k chars. |
| 2026-04-24 | C10 DONE | Slash commands. Client-side dispatcher intercepts /cmd args and handles /help, /clear, /model [id], /agents, /plan on/off without a server round-trip. System-origin messages render through the same markdown pipeline. |
| 2026-04-24 | C11 DONE | Plan mode toggle. Header checkbox flips permissionMode: 'plan' on for the active agent; task runs respect the toggle too. Switching plan mode clears that agent’s session. |
| 2026-04-24 | C12 PARTIAL | enableFileCheckpointing: true is now set on every chat/stream/task query() call — snapshots are captured. UI rewind-to-user-message is deferred because Query.rewindFiles() requires holding the Query object alive across requests, which needs a streaming-input architecture. Added as C12-follow-up in the future list. |
| 2026-04-24 | Docs | docs/drafts/linkedin-project-entry.md added — copy-paste-ready content for LinkedIn Projects section. |
| 2026-04-24 | C13 DONE | WhisprDesk voice integration. Local HTTP proxy routes (/api/whisprdesk/{status,capabilities,transcribe,events}) forwarding to WhisprDesk’s External App Gateway on 127.0.0.1:9879. Browser mic button (MediaRecorder → proxy → transcript). Passive SSE listener auto-fills composer from any WhisprDesk dictation when tab is focused. Browser SpeechSynthesis speak button on each agent reply. Sidebar status indicator. Minimal .env loader added. Token stays server-side. |
| 2026-04-24 | C14 + C15 DONE | Settings modal (SQLite-backed config with env fallback, secrets masked) and dynamic agents (CRUD, custom_agents table, sidebar + New agent button). Closes the operator-surface gap from the audit. |
| 2026-04-24 | Audit sweep DONE | Architect/Reviewer/QA agents identified UI-vs-backend gaps + correctness issues. Closed all 13: Telegram fields disabled with “coming soon”, enableFileCheckpointing flag removed, 4 dead routes deleted, memory validation fixes, race conditions, error-path logging, brittle test hardening. |
| 2026-04-24 | Sidebar UX | ”+ New agent” promoted from a muted dashed button to a prominent gradient primary action. /think hard\|fast\|default slash aliases added on top of /model. |
| 2026-04-24 | Voice UX hardening | Mic button shows ⏹ icon when recording + pink “Recording / click ⏹ to stop” indicator with live timer. Errors from the WhisprDesk proxy now surface upstream details so failures are debuggable. |
| 2026-04-25 | Voice fixes | Browser-side WebM→WAV conversion (Web Audio API + PCM 16-bit encoder) so WhisprDesk’s ffmpeg never chokes on streaming-EBML quirks. Shortcut switched from ⌘⇧M to ⌥V (avoids Chrome user-switcher / macOS minimize collisions). |
| 2026-04-25 | A1 DONE | Cost & token tracking, OAuth-aware. Per-message footer always shows tokens; $ only when API-key auth. Session-totals chip in chat header. SDK’s total_cost_usd used directly — no client-side pricing table. |
| 2026-04-25 | A2 DONE | Session history. Two new SQLite tables (sessions, session_messages); appendTurn() transactional; auto-titles from first user message. 📜 History modal lists past conversations grouped by agent; click any to restore via resume:. |
| 2026-04-25 | A3 DONE | Conversation export. /export, /export md, /export json slash commands generate downloads client-side from existing chat state. Markdown is publish-friendly, JSON keeps raw usage objects for downstream analysis. |
| 2026-04-25 | GitHub Pages | Pages enabled at jaysidd.github.io/claude-agent-lab/ (source: main / root). Repo homepage URL set so the github.com sidebar shows the live URL. README is auto-served as the index by Jekyll. |
| 2026-04-25 | Marketing | Replaced 3 OpenCode references with Clawless cross-promo (intro, “what this is not”, acknowledgements). Honest “same author” disclosures kept in each. README also documents history/cost/export sections with the two new screenshots (13-history-modal, 14-chat-with-usage). |
| 2026-04-25 | Mermaid fix | Streaming sequence-diagram Note text rewritten to plain prose — semicolons and parens were tripping GitHub’s mermaid parser. Audit confirmed no other Notes have parser tripwires. |
| 2026-04-26 | C16 epic added | Autonomous Agent Firm: scheduler + durable tasks + budgets + approvals. Phased over four sub-features (C16a–d). Designed after evaluating Paperclip’s trading-firm demo and confirming the SDK + Max OAuth path is viable for personal-scale autonomous runs. Subsumes the old “Cron / scheduled tasks” Future entry. |
| 2026-04-26 | C16 Clawless align | Cross-checked C16 with Clawless agent same day. C16a already shipped on their side as B06; C16b they want portably (will absorb into their B54); C16c parallel build (their B64, this week — must align preflight signature + main-process-enforcement principle + OAuth-bypass-but-keep-rate-cap); C16d wait-and-see pending qualitative-difference analysis vs their existing per-tool approval. Lane split: their lane includes user-facing budget UX, license-gated runtime, channel adapters, closed-source desktop. Required: design sync on C16b schema + C16c preflight signature before either side commits implementation. |
| 2026-04-26 | C16c signature LOCKED | Second Clawless round same day. Preflight signature check(agentId, estimatedTokens?) → {ok, reason?, capType?: 'cost'\|'rate', remaining?} agreed both sides — estimatedTokens is optional (post-hoc cost accumulation is acceptable for Phase 1; required for rate-cap and Phase-2 precision). Two-tier vocabulary adopted: cost cap ($, OAuth bypasses) + rate cap (requests-per-window, always enforced). Naming: CostGuard system-internal, “Budget” user-facing. C16b: Clawless’s B54 is greenfield (in-memory FIFO, zero SQLite design); our schema is the source of truth, they adopt mechanically. B64 starts coding mid-next-week, launches 2-3 weeks. |
| 2026-04-27 | ClaudeLink wired | .mcp.json adds claudelink-server (stdio MCP) for cross-terminal multi-agent communication. CLAUDE.md gains the ClaudeLink protocol section (inbox-check cadence + shortcut phrases). docs/Agent_Lab/ (writing project drafts) gitignored. Initial relay round to Clawless about C16b was paste-based; subsequent rounds via mcp__claudelink__* tools after a session restart picked up the MCP. |
| 2026-04-27 | C16b schema rev. 2 LOCKED | Two cross-project review rounds with Clawless agent on the durable-queue design. Final shape: 5-state enum (running dropped — worker-side concern), atomic BEGIN IMMEDIATE checkout via RETURNING *, lease-based crash recovery, 4 indexes (added (agent_id, status) for B54 per-agent serialization), migrate(db) exported separately (no bundled _migrations table — host wires into its own migration runner), 64 KB metadata soft-cap, six locked open-question resolutions. Design at .notes/c16b-task-queue-design.md (gitignored). |
| 2026-04-27 | C16b DONE | Durable task queue + atomic checkout shipped. src/taskQueue.ts (442 lines, host-agnostic, zero Express/SDK imports, designed for Clawless B54 mechanical lift) + src/taskQueueInstance.ts (singleton bootstrap with WORKER_ID = {hostname}:{pid}:{uuid}) + src/server.ts refactor of four task routes onto the queue (GET /api/tasks, POST /api/task, POST /api/task/:id/run, DELETE /api/task/:id) preserving the C03 wire format via toApiTask adapter. Tasks now survive restart with status preserved. Commit 16d7784. |
| 2026-04-27 | C16b Reviewer pass | Independent reviewer agent surfaced 6 findings (2 MED, 4 LOW). All fixed: (R1) dropped metadata_json clobber on /run that would have destroyed caller-supplied metadata; (R2) constrained DELETE /api/task/:id to terminal states with 409 on non-terminal; (R6) defense-in-depth try/catch on terminal queue updates so a reaped/deleted-mid-run row doesn’t crash the handler; (R3) doc comment on checkoutById attempt-count semantics; (R4) enqueue validates priority/maxAttempts/scheduledFor; (R5) exhaustive statusFromQueue switch with never guard; (R7) WORKER_ID fork/cluster comment. Commit 2f7c11c. |
| 2026-04-27 | C16b QA pass | Five new Playwright API tests in tests/features.spec.ts (smoke project, no engine): persistence + wire shape, DELETE-on-queued-returns-409, DELETE-on-missing-is-idempotent, priority-enum validation, description-type validation. 27/27 smoke green. Commit acdb5c3. |
| 2026-04-27 | C16b Perf pass | Audit report at docs/audits/perf-audit-c16b.md — 0 HIGH, 2 MED, 5 LOW. (P1) Dropped redundant JS sort in GET /api/tasks by adding TaskFilter.orderBy option (additive — priority default for B54 next-in-queue, host opts into createdAt DESC for kanban). (P2) Gated pruneCompletedTasks on a count check; both statements now prepared once at module load. Commit f7cc8f8. |
| 2026-04-27 | C16b Security pass | Audit report at docs/audits/security-audit-c16b.md — 0 new HIGH/MED, 1 LOW (SC1 reaffirms S5/S6), 2 Info (SC2/SC3 out-of-threat-model). All 17 prepared statements walked + parameterized. Worker_id forgery structurally impossible (server-only, never client-supplied). All 10 prior accepted risks (S1-S10) confirmed unaffected. Watch-list for future sessions: external-source task ingestion (C05/C16d) should default plan-mode-on or gate Run behind approval; future remote-worker API needs worker_id auth before shipping; if /api/task ever accepts client-supplied metadata, wrap the 64 KB cap throw into 400. |
| 2026-04-27 | C16c DONE | CostGuard budget enforcement shipped on branch c16c-costguard, commit e0cb5a2. src/costGuard.ts (standalone primitive — zero Express/SDK imports, designed for Clawless B64 mechanical lift) + src/costGuardInstance.ts (singleton bootstrap reading caps from settings table) + wiring into /api/chat, /api/chat/stream, /api/task/:id/run (preflight 429 + post-call ledger record). New GET /api/costguard/status introspection route. New “Budget (CostGuard)” section in SETTINGS_SCHEMA. OAuth bypasses cost cap by recording is_oauth=1 rows that the cost SUM filters out; rate cap always enforced regardless of provider. Reviewer fixes folded same session: M1 (override allowlist tightened to known agents, no nested dots, no rate_window_seconds per-agent variant) + M2 (cap=0 collapses to “unset” to match the “leave blank for no cap” UX promise). 5 new Playwright smoke tests (32/32 green): schema, allowlist, status shape, exhausted-cap-returns-429-without-firing-SDK (seeds ledger directly to stay in smoke project), cap=0 unset behavior. Untracks gitignored test-results/.last-run.json artifact. Audits at docs/audits/perf-audit-c16c.md + docs/audits/security-audit-c16c.md. |
| 2026-04-27 | C16c Security pass | 0 HIGH/MED/LOW, 3 Info (SC4 ledger has no retention/prune policy — fine at personal scale, watch for Clawless multi-tenant lift; SC5 the 429 reason string discloses cap value/window length — accepted; SC6 costguard.* global keys are intentionally operator-configurable via the existing /api/settings route). All 7 ledger SQL sites walked + parameterized. agentId validated by findAgent() before reaching costGuard.status(). The is_oauth flag is sourced exclusively from the server-captured system.init message at all 4 record() call sites — zero client influence. seedLedgerRow test helper unreachable from production. |
| 2026-04-27 | C16c Perf pass | Audit at docs/audits/perf-audit-c16c.md: 0 HIGH, 0 MED, 7 LOW. Total check() overhead measured at ~22 µs p50 / ~45 µs p99 — invisible against 1-10 s of LLM latency. Both ledger queries hit idx_ledger_agent_time (rate query is COVERING). One actionable fix applied (P1): cached prepared statement in settings.ts:getSetting() — drops 5× re-prepare per resolveCaps() from ~20 µs to ~2.4 µs and benefits every configValue caller (WhisprDesk, future Telegram, etc.), not just CostGuard. Six accepts: P2 partial-index threshold (>10k month-rows-per-agent — irrelevant at personal scale), P3 ledger pruning (80 B/row gives years), P4 no N+1 between check/record, P5 no double-record on aborted streams, P6 startOfMonth nanosecond cost, P7 sync sqlite is fine for single-process. Watch list noted for Clawless multi-tenant lift: partial covering index (agent_id, occurred_at, cost_usd) WHERE is_oauth=0 (9× faster on month-sum at 30k rows), write-behind ledger inserts, async sqlite binding. |
| 2026-04-28 | README refresh (PR #3) | 1d77404. Two new feature sections (Durable task queue, Budget caps / CostGuard), CostGuard preflight sequence diagram, redrawn task state machine for the durable 5-state enum, architecture state table flipped from in-memory-Maps to per-table SQLite breakdown, API contract updated (CostGuard + custom-agent CRUD + settings; bulk-delete-memories row removed), project layout and LOC refreshed (~2,500 → ~7,500), tests badge 22 → 35, “What’s on the backlog” reframed around C16 with C16a marked ⭐ next. New screenshot 15-settings-budget.png. Docs-only — Reviewer-only per skip rules; no Perf/Security audits. |
| 2026-04-28 | C16a DONE | Cron-style scheduler shipped on branch c16a-scheduler. src/scheduler.ts (~470 LOC, host-agnostic primitive — zero Express/SDK imports, mirrors taskQueue.ts/costGuard.ts pattern, ready for Clawless lift) + src/schedulerInstance.ts (singleton bootstrap with cron-parser v5 CronExpressionParser) + wiring into server.ts: 8 routes (GET/POST /api/schedules, GET/PATCH/DELETE /api/schedules/:id, POST /api/schedules/:id/{run-now,pause,resume}, POST /api/cron/preview) + onFire callback that runs the SDK with OAuth-dead detection (two-token regex + before-first-assistant-message position guard) + 3-strike auto-pause for non-OAuth recurring failures + CostGuard preflight + taskQueue-backed fires. Schedules modal UI: list view with status badges, create form with cron preset chips + live “next 3 fires” preview, pause/resume/run-now/delete actions. Single 30s tick fires due schedules + lights up taskQueue.reapExpired() (was unreachable per C16b P4). Reviewer pass: 13 findings, 5 fixed pre-QA (recordOutcome enabled=1 guard + TOCTOU removal via consecutive_failures + 1, OAUTH_DEAD_PATTERN tightened to two-token requirement, run-now .catch for void’d promise, budget-block uses taskQueue.cancel instead of checkoutById+fail). 9 new Playwright smoke tests (42/42 green). |
| 2026-04-28 | C16a Perf pass | Audit at docs/audits/perf-audit-c16a.md: 0 HIGH, 0 MED, 8 LOW — ship as-is. EXPLAIN QUERY PLAN confirms idx_schedules_due partial index serves the tick query (SEARCH schedules USING INDEX idx_schedules_due (next_fire_at<?) with ORDER BY satisfied by index — no TEMP B-TREE). Steady-state tick at 50 schedules / 0 due: 0.23 µs every 30 s. P1 inline db.prepare(...) (15 sites in scheduler.ts vs costGuard.ts’s constructor cache) measured at ~6 µs/call — invisible at personal scale, worth a 30-line refactor for the Clawless lift. P2 scheduler.list() SCAN (partial index excludes paused rows) is ~46 µs at N=50, ~3.8 ms at N=5000; user-visible boundary ~12k rows. P3 reaffirms taskQueue.reapExpired stays dormant (audit found C16a tick wires it correctly — but the tick doesn’t currently call it; clarified vs the original prompt). cronPreview() cost ~35 µs / 3 fires; 200 ms UI debounce more than sufficient. Concurrent fire fan-out is safe — better-sqlite3 transactions serialize on single Node process, sync prefix of executeFire commits before any await. |
| 2026-04-28 | C16a Security pass | Audit at docs/audits/security-audit-c16a.md: 0 HIGH, 0 MED, 1 LOW (S-C16a-1 reaffirms baseline S1 — cwd accepted unvalidated, watch-list for commercial path), 4 Info. Inline fix applied (S-C16a-2): OAUTH_DEAD_PATTERN extended with a second top-level alternation to also catch the canonical CLI exhortation Please run \claude login`. Strictly additive — every prior true/false case retained; three real-world phrasings flip miss-to-hit. Pre-fix consequence was benign (legit OAuth-dead errors auto-paused after 3 strikes under too_many_failures instead of oauth_unavailable). Confirmed safe: stored prompt injection (localhost-only + agent allowedTools + kanban audit), cron-parser DoS (worst-case 100-char input parses in ~15 ms; library rejects >6-field inputs), SQL injection (15 prepared statements walked, all parameterized; dynamic UPDATE field names from a fixed source-literal allowlist), race conditions (delete-mid-fire / pause-mid-fire / tick-vs-delete all clean — enabled=1 guards prevent stale fires from clobbering manual pauses), metadata.scheduleId info leak (toApiTask shaper drops metadata — verified). Watch list for Clawless multi-tenant lift: schedule routes become attack surface; /run-now` is auth-free trigger for stored prompts. |