[codex] Harden execution reliability and heartbeat tooling (#3679)

## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Reliable execution depends on heartbeat routing, issue lifecycle semantics, telemetry, and a fast enough local verification loop to keep regressions visible > - The remaining commits on this branch were mostly server/runtime correctness fixes plus test and documentation follow-ups in that area > - Those changes are logically separate from the UI-focused issue-detail and workspace/navigation branches even when they touch overlapping issue APIs > - This pull request groups the execution reliability, heartbeat, telemetry, and tooling changes into one standalone branch > - The benefit is a focused review of the control-plane correctness work, including the follow-up fix that restored the implicit comment-reopen helpers after branch splitting ## What Changed - Hardened issue/heartbeat execution behavior, including self-review stage skipping, deferred mention wakes during active execution, stranded execution recovery, active-run scoping, assignee resolution, and blocked-to-todo wake resumption - Reduced noisy polling/logging overhead by trimming issue run payloads, compacting persisted run logs, silencing high-volume request logs, and capping heartbeat-run queries in dashboard/inbox surfaces - Expanded telemetry and status semantics with adapter/model fields on task completion plus clearer status guidance in docs/onboarding material - Updated test infrastructure and verification defaults with faster route-test module isolation, cheaper default `pnpm test`, e2e isolation from local state, and repo verification follow-ups - Included docs/release housekeeping from the branch and added a small follow-up commit restoring the implicit comment-reopen helpers that were dropped during branch reconstruction ## Verification - `pnpm vitest run server/src/__tests__/issue-comment-reopen-routes.test.ts server/src/__tests__/issue-telemetry-routes.test.ts` - `pnpm vitest run server/src/__tests__/http-log-policy.test.ts server/src/__tests__/heartbeat-run-log.test.ts server/src/__tests__/health.test.ts` - `server/src/__tests__/activity-service.test.ts`, `server/src/__tests__/heartbeat-comment-wake-batching.test.ts`, and `server/src/__tests__/heartbeat-process-recovery.test.ts` were attempted on this host but the embedded Postgres harness reported init-script/data-dir problems and skipped or failed to start, so they are noted as environment-limited ## Risks - Medium: this branch changes core issue/heartbeat routing and reopen/wakeup behavior, so regressions would affect agent execution flow rather than isolated UI polish - Because it also updates verification infrastructure, reviewers should pay attention to whether the new tests are asserting the right failure modes and not just reshaping harness behavior ## Model Used - OpenAI Codex coding agent (GPT-5-class runtime in Codex CLI; exact deployed model ID is not exposed in this environment), reasoning enabled, tool use and local code execution enabled ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [ ] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-06-14 01:50:39 +09:00 · 2026-04-14 13:34:52 -05:00 · 2026-04-14 13:34:52 -05:00 · 7f893ac4ec
commit 7f893ac4ec
parent e89076148a
106 changed files with 4682 additions and 713 deletions
--- a/doc/DEVELOPING.md
+++ b/doc/DEVELOPING.md
@ -79,6 +79,29 @@ Allow additional private hostnames (for example custom Tailscale hostnames):
 pnpm paperclipai allowed-hostname dotta-macbook-pro
 ```

+## Test Commands
+
+Use the cheap local default unless you are specifically working on browser flows:
+
+```sh
+pnpm test
+```
+
+`pnpm test` runs the Vitest suite only. For interactive Vitest watch mode use:
+
+```sh
+pnpm test:watch
+```
+
+Browser suites stay separate:
+
+```sh
+pnpm test:e2e
+pnpm test:release-smoke
+```
+
+These browser suites are intended for targeted local verification and CI, not the default agent/human test command.
+
 ## One-Command Local Run

 For a first-time local install, you can bootstrap and run in one command:
--- a/doc/SPEC-implementation.md
+++ b/doc/SPEC-implementation.md
@ -395,6 +395,8 @@ Side effects:
 - entering `done` sets `completed_at`
 - entering `cancelled` sets `cancelled_at`

+Detailed ownership, execution, blocker, and crash-recovery semantics are documented in `doc/execution-semantics.md`.
+
 ## 8.3 Approval Status

 - `pending -> approved | rejected | cancelled`
--- a/doc/execution-semantics.md
+++ b/doc/execution-semantics.md
@ -0,0 +1,252 @@
+# Execution Semantics
+
+Status: Current implementation guide
+Date: 2026-04-13
+Audience: Product and engineering
+
+This document explains how Paperclip interprets issue assignment, issue status, execution runs, wakeups, parent/sub-issue structure, and blocker relationships.
+
+`doc/SPEC-implementation.md` remains the V1 contract. This document is the detailed execution model behind that contract.
+
+## 1. Core Model
+
+Paperclip separates four concepts that are easy to blur together:
+
+1. structure: parent/sub-issue relationships
+2. dependency: blocker relationships
+3. ownership: who is responsible for the issue now
+4. execution: whether the control plane currently has a live path to move the issue forward
+
+The system works best when those are kept separate.
+
+## 2. Assignee Semantics
+
+An issue has at most one assignee.
+
+- `assigneeAgentId` means the issue is owned by an agent
+- `assigneeUserId` means the issue is owned by a human board user
+- both cannot be set at the same time
+
+This is a hard invariant. Paperclip is single-assignee by design.
+
+## 3. Status Semantics
+
+Paperclip issue statuses are not just UI labels. They imply different expectations about ownership and execution.
+
+### `backlog`
+
+The issue is not ready for active work.
+
+- no execution expectation
+- no pickup expectation
+- safe resting state for future work
+
+### `todo`
+
+The issue is actionable but not actively claimed.
+
+- it may be assigned or unassigned
+- no checkout/execution lock is required yet
+- for agent-assigned work, Paperclip may still need a wake path to ensure the assignee actually sees it
+
+### `in_progress`
+
+The issue is actively owned work.
+
+- requires an assignee
+- for agent-owned issues, this is a strict execution-backed state
+- for user-owned issues, this is a human ownership state and is not backed by heartbeat execution
+
+For agent-owned issues, `in_progress` should not be allowed to become a silent dead state.
+
+### `blocked`
+
+The issue cannot proceed until something external changes.
+
+This is the right state for:
+
+- waiting on another issue
+- waiting on a human decision
+- waiting on an external dependency or system
+- work that automatic recovery could not safely continue
+
+### `in_review`
+
+Execution work is paused because the next move belongs to a reviewer or approver, not the current executor.
+
+### `done`
+
+The work is complete and terminal.
+
+### `cancelled`
+
+The work will not continue and is terminal.
+
+## 4. Agent-Owned vs User-Owned Execution
+
+The execution model differs depending on assignee type.
+
+### Agent-owned issues
+
+Agent-owned issues are part of the control plane's execution loop.
+
+- Paperclip can wake the assignee
+- Paperclip can track runs linked to the issue
+- Paperclip can recover some lost execution state after crashes/restarts
+
+### User-owned issues
+
+User-owned issues are not executed by the heartbeat scheduler.
+
+- Paperclip can track the ownership and status
+- Paperclip cannot rely on heartbeat/run semantics to keep them moving
+- stranded-work reconciliation does not apply to them
+
+This is why `in_progress` can be strict for agents without forcing the same runtime rules onto human-held work.
+
+## 5. Checkout and Active Execution
+
+Checkout is the bridge from issue ownership to active agent execution.
+
+- checkout is required to move an issue into agent-owned `in_progress`
+- `checkoutRunId` represents issue-ownership lock for the current agent run
+- `executionRunId` represents the currently active execution path for the issue
+
+These are related but not identical:
+
+- `checkoutRunId` answers who currently owns execution rights for the issue
+- `executionRunId` answers which run is actually live right now
+
+Paperclip already clears stale execution locks and can adopt some stale checkout locks when the original run is gone.
+
+## 6. Parent/Sub-Issue vs Blockers
+
+Paperclip uses two different relationships for different jobs.
+
+### Parent/Sub-Issue (`parentId`)
+
+This is structural.
+
+Use it for:
+
+- work breakdown
+- rollup context
+- explaining why a child issue exists
+- waking the parent assignee when all direct children become terminal
+
+Do not treat `parentId` as execution dependency by itself.
+
+### Blockers (`blockedByIssueIds`)
+
+This is dependency semantics.
+
+Use it for:
+
+- \"this issue cannot continue until that issue changes state\"
+- explicit waiting relationships
+- automatic wakeups when all blockers resolve
+
+If a parent is truly waiting on a child, model that with blockers. Do not rely on the parent/child relationship alone.
+
+## 7. Consistent Execution Path Rules
+
+For agent-assigned, non-terminal, actionable issues, Paperclip should not leave work in a state where nobody is working it and nothing will wake it.
+
+The relevant execution path depends on status.
+
+### Agent-assigned `todo`
+
+This is dispatch state: ready to start, not yet actively claimed.
+
+A healthy dispatch state means at least one of these is true:
+
+- the issue already has a queued/running wake path
+- the issue is intentionally resting in `todo` after a successful agent heartbeat, not after an interrupted dispatch
+- the issue has been explicitly surfaced as stranded
+
+### Agent-assigned `in_progress`
+
+This is active-work state.
+
+A healthy active-work state means at least one of these is true:
+
+- there is an active run for the issue
+- there is already a queued continuation wake
+- the issue has been explicitly surfaced as stranded
+
+## 8. Crash and Restart Recovery
+
+Paperclip now treats crash/restart recovery as a stranded-assigned-work problem, not just a stranded-run problem.
+
+There are two distinct failure modes.
+
+### 8.1 Stranded assigned `todo`
+
+Example:
+
+- issue is assigned to an agent
+- status is `todo`
+- the original wake/run died during or after dispatch
+- after restart there is no queued wake and nothing picks the issue back up
+
+Recovery rule:
+
+- if the latest issue-linked run failed/timed out/cancelled and no live execution path remains, Paperclip queues one automatic assignment recovery wake
+- if that recovery wake also finishes and the issue is still stranded, Paperclip moves the issue to `blocked` and posts a visible comment
+
+This is a dispatch recovery, not a continuation recovery.
+
+### 8.2 Stranded assigned `in_progress`
+
+Example:
+
+- issue is assigned to an agent
+- status is `in_progress`
+- the live run disappeared
+- after restart there is no active run and no queued continuation
+
+Recovery rule:
+
+- Paperclip queues one automatic continuation wake
+- if that continuation wake also finishes and the issue is still stranded, Paperclip moves the issue to `blocked` and posts a visible comment
+
+This is an active-work continuity recovery.
+
+## 9. Startup and Periodic Reconciliation
+
+Startup recovery and periodic recovery are different from normal wakeup delivery.
+
+On startup and on the periodic recovery loop, Paperclip now does three things in sequence:
+
+1. reap orphaned `running` runs
+2. resume persisted `queued` runs
+3. reconcile stranded assigned work
+
+That last step is what closes the gap where issue state survives a crash but the wake/run path does not.
+
+## 10. What This Does Not Mean
+
+These semantics do not change V1 into an auto-reassignment system.
+
+Paperclip still does not:
+
+- automatically reassign work to a different agent
+- infer dependency semantics from `parentId` alone
+- treat human-held work as heartbeat-managed execution
+
+The recovery model is intentionally conservative:
+
+- preserve ownership
+- retry once when the control plane lost execution continuity
+- escalate visibly when the system cannot safely keep going
+
+## 11. Practical Interpretation
+
+For a board operator, the intended meaning is:
+
+- agent-owned `in_progress` should mean \"this is live work or clearly surfaced as a problem\"
+- agent-owned `todo` should not stay assigned forever after a crash with no remaining wake path
+- parent/sub-issue explains structure
+- blockers explain waiting
+
+That is the execution contract Paperclip should present to operators.
--- a/doc/plans/2026-04-12-vscode-task-interoperability-plan.md
+++ b/doc/plans/2026-04-12-vscode-task-interoperability-plan.md
@ -0,0 +1,382 @@
+# VS Code Task Interoperability Plan
+
+Status: planning only, no code changes
+Date: 2026-04-12
+Related issue: `PAP-1377`
+
+## Summary
+
+Paperclip should not replace its workspace runtime service model with VS Code tasks.
+It should add a narrow interoperability layer that can discover and adopt supported entries from `.vscode/tasks.json`.
+
+The core product model should stay:
+
+- Paperclip owns long-running workspace services and their desired state
+- Paperclip shows operators exactly which named thing they are starting or stopping
+- Paperclip distinguishes long-running services from one-shot jobs
+
+VS Code tasks should be treated as:
+
+- an import/discovery format for workspace commands
+- a convenience for repos that already maintain `tasks.json`
+- a partial compatibility layer, not a full execution model
+
+## Current State
+
+The current implementation is already service-oriented:
+
+- project workspaces and execution workspaces can store `workspaceRuntime` config plus `desiredState` and per-service `serviceStates`
+- the UI renders one control row per configured service and persists start/stop intent
+- the backend supervises long-running local processes, reuses eligible services, and restores desired services on startup
+
+Relevant files:
+
+- `packages/shared/src/types/workspace-runtime.ts`
+- `server/src/services/workspace-runtime.ts`
+- `server/src/services/project-workspace-runtime-config.ts`
+- `ui/src/components/WorkspaceRuntimeControls.tsx`
+- `ui/src/pages/ProjectWorkspaceDetail.tsx`
+- `ui/src/pages/ExecutionWorkspaceDetail.tsx`
+
+This is directionally correct for Paperclip because it gives the control plane an explicit model for service lifecycle, health, reuse, and restart behavior.
+
+## Problem To Solve
+
+The current UX is still too raw:
+
+- operators have to hand-author runtime JSON
+- a workspace can have multiple attached services, but the higher-level intent is not obvious
+- start/stop controls are visible in multiple places, which makes it easy to lose track of what is being controlled
+- there is no interoperability with repos that already define useful local workflows in `.vscode/tasks.json`
+
+The issue is not that services are the wrong abstraction.
+The issue is that the configuration surface is too low-level and Paperclip does not yet leverage existing workspace metadata.
+
+## Recommendation
+
+Keep Paperclip runtime services as the source of truth for service supervision.
+Add a new workspace command model above the raw JSON layer, with VS Code task discovery as one input.
+
+The product model should become:
+
+1. `Workspace command`
+   A named runnable thing attached to a workspace.
+
+2. `Workspace service`
+   A workspace command that is expected to stay alive and be supervised.
+
+3. `Workspace job`
+   A workspace command that runs once and exits.
+
+4. `Runtime service instance`
+   The live process record that already exists today in Paperclip.
+
+In that model, VS Code tasks are a way to populate workspace commands.
+Only commands that map cleanly to Paperclip service or job semantics should become runnable in Paperclip.
+
+## Why Not Fully Adopt VS Code Tasks
+
+VS Code tasks are broader than Paperclip runtime services.
+They include shell/process tasks, compound tasks, background/watch tasks, presentation settings, extension/task-provider types, variable substitution, and problem-matcher-driven lifecycle.
+
+That creates a bad fit if Paperclip tries to use `tasks.json` as its only runtime model:
+
+- many tasks are one-shot jobs, not long-running services
+- some tasks depend on VS Code task providers or editor-only variable resolution
+- compound task graphs are useful, but they are not the same thing as a supervised service
+- problem matcher readiness is useful metadata, but it is not enough to replace Paperclip's persisted service lifecycle model
+
+The right boundary is interoperability, not replacement.
+
+## Interoperability Contract
+
+Paperclip should support a conservative subset of VS Code tasks and clearly mark unsupported entries.
+
+### Supported in phase 1
+
+- `shell` and `process` tasks with a concrete command Paperclip can resolve
+- optional task `options.cwd`
+- optional task environment values that can be flattened safely
+- task labels and detail text for naming and display
+- `dependsOn` for import-time expansion or display-only dependency hints
+- background/watch-oriented tasks that can reasonably be treated as long-running services
+
+### Maybe supported in later phases
+
+- grouping and default task metadata for better UX
+- selected variable substitution when Paperclip can resolve it safely from workspace context
+- mapping task metadata into Paperclip readiness/expose hints
+- limited compound-task launch flows
+
+### Not supported initially
+
+- extension-provided task types Paperclip cannot execute directly
+- arbitrary VS Code variable substitution semantics
+- problem matcher parsing as the main source of service health
+- full parity with VS Code task execution behavior
+
+## Long-Running Service Detection
+
+Paperclip needs an explicit classification layer instead of assuming every VS Code task is a service.
+
+Recommended classification:
+
+- `service`
+  Explicitly marked by Paperclip metadata, or confidently inferred from background/watch task semantics
+
+- `job`
+  One-shot command expected to exit
+
+- `unsupported`
+  Present in `tasks.json`, but not safely runnable by Paperclip
+
+The important product decision is that service classification must be visible and editable by the operator.
+Inference can help, but it should not be the only source of truth.
+
+## Proposed Product Shape
+
+### 1. Replace raw-first editing with command-first editing
+
+Project and execution workspace pages should stop making raw runtime JSON the primary editing surface.
+
+Default UI should show:
+
+- workspace commands
+- command type: service or job
+- source: Paperclip or VS Code
+- exact command and cwd
+- current state for services
+- explicit start, stop, restart, and run-now actions
+
+Raw JSON should remain available behind an advanced section.
+
+### 2. Add VS Code task discovery on workspaces
+
+For a workspace with `cwd`, Paperclip should look for `.vscode/tasks.json`.
+
+The workspace UI should show:
+
+- whether a `tasks.json` file was found
+- last parse time
+- supported commands discovered
+- unsupported tasks with reasons
+- whether commands are inherited into execution workspaces
+
+### 3. Make the controlled thing explicit
+
+Start and stop UI should always name the exact entry being controlled.
+
+Examples:
+
+- `Start web`
+- `Stop api`
+- `Run db:migrate`
+
+Avoid generic workspace-level labels when multiple commands exist.
+
+### 4. Separate services from jobs in the UI
+
+Do not mix one-shot jobs and long-running services into one undifferentiated list.
+
+Recommended sections:
+
+- `Services`
+- `Jobs`
+- `Unsupported imported tasks`
+
+That resolves the ambiguity called out in the issue.
+
+## Data Model Direction
+
+Do not replace `workspaceRuntime` immediately.
+Instead add a higher-level representation that can compile down to the existing runtime-service machinery.
+
+Suggested workspace metadata shape:
+
+```ts
+type WorkspaceCommandSource =
+  | { type: "paperclip" }
+  | { type: "vscode_task"; taskLabel: string; taskPath: ".vscode/tasks.json" };
+
+type WorkspaceCommandKind = "service" | "job";
+
+type WorkspaceCommandDefinition = {
+  id: string;
+  name: string;
+  kind: WorkspaceCommandKind;
+  source: WorkspaceCommandSource;
+  command: string | null;
+  cwd: string | null;
+  env?: Record<string, string> | null;
+  autoStart?: boolean;
+  serviceConfig?: {
+    lifecycle?: "shared" | "ephemeral";
+    reuseScope?: "project_workspace" | "execution_workspace" | "run";
+    readiness?: Record<string, unknown> | null;
+    expose?: Record<string, unknown> | null;
+  } | null;
+  importWarnings?: string[];
+  disabledReason?: string | null;
+};
+```
+
+`workspaceRuntime` can then become a derived or advanced representation for service-type commands until the rest of the system is migrated.
+
+## VS Code Mapping Rules
+
+Paperclip should map imported tasks with explicit, documented rules.
+
+Recommended rules:
+
+1. A task becomes a `job` by default.
+2. A task becomes a `service` only when:
+   - Paperclip metadata marks it as a service, or
+   - the task clearly represents a background/watch process and the operator confirms the classification.
+3. Unsupported tasks stay visible but disabled.
+4. Task labels become default command names.
+5. `dependsOn` is preserved as metadata, not silently flattened into hidden behavior.
+
+Paperclip-specific metadata can live in a namespaced field on the imported task definition, for example:
+
+```json
+{
+  "label": "web",
+  "type": "shell",
+  "command": "pnpm dev",
+  "isBackground": true,
+  "paperclip": {
+    "kind": "service",
+    "readiness": {
+      "type": "http",
+      "urlTemplate": "http://127.0.0.1:${port}"
+    },
+    "expose": {
+      "type": "url",
+      "urlTemplate": "http://127.0.0.1:${port}"
+    }
+  }
+}
+```
+
+That gives us interoperability without depending on VS Code-only semantics for service readiness and exposure.
+
+## Execution Policy
+
+Project workspaces should be the main place where imported commands are discovered and curated.
+Execution workspaces should inherit that curated command set by default, with optional issue-level overrides.
+
+Recommended precedence:
+
+1. execution workspace override
+2. project workspace command set
+3. imported VS Code tasks from the linked workspace
+4. advanced raw runtime fallback
+
+This matches the existing direction in `doc/plans/2026-03-10-workspace-strategy-and-git-worktrees.md`.
+
+## Implementation Plan
+
+### Phase 1: Discovery and read-only visibility
+
+Goal:
+show imported VS Code tasks in the workspace UI without changing runtime behavior.
+
+Work:
+
+- parse `.vscode/tasks.json` for project workspaces with local `cwd`
+- derive a list of candidate commands plus unsupported items
+- show source, label, command, cwd, and classification
+- show parse warnings and unsupported reasons
+
+Success condition:
+an operator can see what Paperclip would import and why.
+
+### Phase 2: Command model and explicit classification
+
+Goal:
+introduce a first-class workspace command layer above raw runtime JSON.
+
+Work:
+
+- add a persisted command definition model in workspace metadata or a dedicated table
+- allow operator edits to imported command classification
+- separate `service` and `job` in UI
+- keep existing runtime-service storage for live supervised processes
+
+Success condition:
+the workspace UI is command-first, and raw runtime JSON is advanced-only.
+
+### Phase 3: Service execution backed by existing runtime supervisor
+
+Goal:
+run supported imported service commands through the current Paperclip supervisor.
+
+Work:
+
+- compile service commands into the existing runtime service start/stop path
+- persist desired state per named command
+- keep startup restoration behavior for service commands
+- make the active command name explicit everywhere control actions appear
+
+Success condition:
+imported service commands behave like native Paperclip services once adopted.
+
+### Phase 4: Job execution and optional dependency handling
+
+Goal:
+support one-shot imported commands without pretending they are services.
+
+Work:
+
+- add `Run` actions for jobs
+- record output in workspace operations
+- optionally support simple `dependsOn` execution for jobs with clear logging
+
+Success condition:
+one-shot tasks are runnable, but they are not mixed into the service lifecycle model.
+
+### Phase 5: Adapter and execution workspace integration
+
+Goal:
+let agents and issue-scoped workspaces consume the curated command model consistently.
+
+Work:
+
+- expose inherited workspace commands to execution workspaces
+- allow issue-level selection of a default service command when relevant
+- make service selection explicit in issue and workspace views
+
+Success condition:
+agents, operators, and workspaces all refer to the same named commands.
+
+## Non-Goals
+
+- full VS Code task-runner parity
+- support for every VS Code task type
+- removal of Paperclip's own runtime supervision model
+- editor-dependent execution semantics inside the control plane
+
+## Risks
+
+- overfitting Paperclip to VS Code and making the model worse for non-VS-Code repos
+- misclassifying watch tasks as durable services
+- hiding too much detail and making debugging harder
+- allowing imported task graphs to become implicit magic
+
+These risks are manageable if the import layer stays explicit, conservative, and operator-editable.
+
+## Decision
+
+Paperclip should adopt VS Code tasks as an optional workspace command source, not as the canonical runtime model.
+
+The main UX change should be:
+
+- move from raw runtime JSON to named workspace commands
+- separate services from jobs
+- make the exact controlled command explicit
+- let `.vscode/tasks.json` pre-populate those commands when available
+
+## External References
+
+- VS Code tasks documentation: https://code.visualstudio.com/docs/debugtest/tasks
+- Existing Paperclip workspace plan: `doc/plans/2026-03-10-workspace-strategy-and-git-worktrees.md`