[codex] Harden execution reliability and heartbeat tooling (#3679)

## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Reliable execution depends on heartbeat routing, issue lifecycle semantics, telemetry, and a fast enough local verification loop to keep regressions visible > - The remaining commits on this branch were mostly server/runtime correctness fixes plus test and documentation follow-ups in that area > - Those changes are logically separate from the UI-focused issue-detail and workspace/navigation branches even when they touch overlapping issue APIs > - This pull request groups the execution reliability, heartbeat, telemetry, and tooling changes into one standalone branch > - The benefit is a focused review of the control-plane correctness work, including the follow-up fix that restored the implicit comment-reopen helpers after branch splitting ## What Changed - Hardened issue/heartbeat execution behavior, including self-review stage skipping, deferred mention wakes during active execution, stranded execution recovery, active-run scoping, assignee resolution, and blocked-to-todo wake resumption - Reduced noisy polling/logging overhead by trimming issue run payloads, compacting persisted run logs, silencing high-volume request logs, and capping heartbeat-run queries in dashboard/inbox surfaces - Expanded telemetry and status semantics with adapter/model fields on task completion plus clearer status guidance in docs/onboarding material - Updated test infrastructure and verification defaults with faster route-test module isolation, cheaper default `pnpm test`, e2e isolation from local state, and repo verification follow-ups - Included docs/release housekeeping from the branch and added a small follow-up commit restoring the implicit comment-reopen helpers that were dropped during branch reconstruction ## Verification - `pnpm vitest run server/src/__tests__/issue-comment-reopen-routes.test.ts server/src/__tests__/issue-telemetry-routes.test.ts` - `pnpm vitest run server/src/__tests__/http-log-policy.test.ts server/src/__tests__/heartbeat-run-log.test.ts server/src/__tests__/health.test.ts` - `server/src/__tests__/activity-service.test.ts`, `server/src/__tests__/heartbeat-comment-wake-batching.test.ts`, and `server/src/__tests__/heartbeat-process-recovery.test.ts` were attempted on this host but the embedded Postgres harness reported init-script/data-dir problems and skipped or failed to start, so they are noted as environment-limited ## Risks - Medium: this branch changes core issue/heartbeat routing and reopen/wakeup behavior, so regressions would affect agent execution flow rather than isolated UI polish - Because it also updates verification infrastructure, reviewers should pay attention to whether the new tests are asserting the right failure modes and not just reshaping harness behavior ## Model Used - OpenAI Codex coding agent (GPT-5-class runtime in Codex CLI; exact deployed model ID is not exposed in this environment), reasoning enabled, tool use and local code execution enabled ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [ ] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-06-14 01:50:39 +09:00 · 2026-04-14 13:34:52 -05:00 · 2026-04-14 13:34:52 -05:00 · 7f893ac4ec
commit 7f893ac4ec
parent e89076148a
106 changed files with 4682 additions and 713 deletions
--- a/doc/plans/2026-04-12-vscode-task-interoperability-plan.md
+++ b/doc/plans/2026-04-12-vscode-task-interoperability-plan.md
@ -0,0 +1,382 @@
+# VS Code Task Interoperability Plan
+
+Status: planning only, no code changes
+Date: 2026-04-12
+Related issue: `PAP-1377`
+
+## Summary
+
+Paperclip should not replace its workspace runtime service model with VS Code tasks.
+It should add a narrow interoperability layer that can discover and adopt supported entries from `.vscode/tasks.json`.
+
+The core product model should stay:
+
+- Paperclip owns long-running workspace services and their desired state
+- Paperclip shows operators exactly which named thing they are starting or stopping
+- Paperclip distinguishes long-running services from one-shot jobs
+
+VS Code tasks should be treated as:
+
+- an import/discovery format for workspace commands
+- a convenience for repos that already maintain `tasks.json`
+- a partial compatibility layer, not a full execution model
+
+## Current State
+
+The current implementation is already service-oriented:
+
+- project workspaces and execution workspaces can store `workspaceRuntime` config plus `desiredState` and per-service `serviceStates`
+- the UI renders one control row per configured service and persists start/stop intent
+- the backend supervises long-running local processes, reuses eligible services, and restores desired services on startup
+
+Relevant files:
+
+- `packages/shared/src/types/workspace-runtime.ts`
+- `server/src/services/workspace-runtime.ts`
+- `server/src/services/project-workspace-runtime-config.ts`
+- `ui/src/components/WorkspaceRuntimeControls.tsx`
+- `ui/src/pages/ProjectWorkspaceDetail.tsx`
+- `ui/src/pages/ExecutionWorkspaceDetail.tsx`
+
+This is directionally correct for Paperclip because it gives the control plane an explicit model for service lifecycle, health, reuse, and restart behavior.
+
+## Problem To Solve
+
+The current UX is still too raw:
+
+- operators have to hand-author runtime JSON
+- a workspace can have multiple attached services, but the higher-level intent is not obvious
+- start/stop controls are visible in multiple places, which makes it easy to lose track of what is being controlled
+- there is no interoperability with repos that already define useful local workflows in `.vscode/tasks.json`
+
+The issue is not that services are the wrong abstraction.
+The issue is that the configuration surface is too low-level and Paperclip does not yet leverage existing workspace metadata.
+
+## Recommendation
+
+Keep Paperclip runtime services as the source of truth for service supervision.
+Add a new workspace command model above the raw JSON layer, with VS Code task discovery as one input.
+
+The product model should become:
+
+1. `Workspace command`
+   A named runnable thing attached to a workspace.
+
+2. `Workspace service`
+   A workspace command that is expected to stay alive and be supervised.
+
+3. `Workspace job`
+   A workspace command that runs once and exits.
+
+4. `Runtime service instance`
+   The live process record that already exists today in Paperclip.
+
+In that model, VS Code tasks are a way to populate workspace commands.
+Only commands that map cleanly to Paperclip service or job semantics should become runnable in Paperclip.
+
+## Why Not Fully Adopt VS Code Tasks
+
+VS Code tasks are broader than Paperclip runtime services.
+They include shell/process tasks, compound tasks, background/watch tasks, presentation settings, extension/task-provider types, variable substitution, and problem-matcher-driven lifecycle.
+
+That creates a bad fit if Paperclip tries to use `tasks.json` as its only runtime model:
+
+- many tasks are one-shot jobs, not long-running services
+- some tasks depend on VS Code task providers or editor-only variable resolution
+- compound task graphs are useful, but they are not the same thing as a supervised service
+- problem matcher readiness is useful metadata, but it is not enough to replace Paperclip's persisted service lifecycle model
+
+The right boundary is interoperability, not replacement.
+
+## Interoperability Contract
+
+Paperclip should support a conservative subset of VS Code tasks and clearly mark unsupported entries.
+
+### Supported in phase 1
+
+- `shell` and `process` tasks with a concrete command Paperclip can resolve
+- optional task `options.cwd`
+- optional task environment values that can be flattened safely
+- task labels and detail text for naming and display
+- `dependsOn` for import-time expansion or display-only dependency hints
+- background/watch-oriented tasks that can reasonably be treated as long-running services
+
+### Maybe supported in later phases
+
+- grouping and default task metadata for better UX
+- selected variable substitution when Paperclip can resolve it safely from workspace context
+- mapping task metadata into Paperclip readiness/expose hints
+- limited compound-task launch flows
+
+### Not supported initially
+
+- extension-provided task types Paperclip cannot execute directly
+- arbitrary VS Code variable substitution semantics
+- problem matcher parsing as the main source of service health
+- full parity with VS Code task execution behavior
+
+## Long-Running Service Detection
+
+Paperclip needs an explicit classification layer instead of assuming every VS Code task is a service.
+
+Recommended classification:
+
+- `service`
+  Explicitly marked by Paperclip metadata, or confidently inferred from background/watch task semantics
+
+- `job`
+  One-shot command expected to exit
+
+- `unsupported`
+  Present in `tasks.json`, but not safely runnable by Paperclip
+
+The important product decision is that service classification must be visible and editable by the operator.
+Inference can help, but it should not be the only source of truth.
+
+## Proposed Product Shape
+
+### 1. Replace raw-first editing with command-first editing
+
+Project and execution workspace pages should stop making raw runtime JSON the primary editing surface.
+
+Default UI should show:
+
+- workspace commands
+- command type: service or job
+- source: Paperclip or VS Code
+- exact command and cwd
+- current state for services
+- explicit start, stop, restart, and run-now actions
+
+Raw JSON should remain available behind an advanced section.
+
+### 2. Add VS Code task discovery on workspaces
+
+For a workspace with `cwd`, Paperclip should look for `.vscode/tasks.json`.
+
+The workspace UI should show:
+
+- whether a `tasks.json` file was found
+- last parse time
+- supported commands discovered
+- unsupported tasks with reasons
+- whether commands are inherited into execution workspaces
+
+### 3. Make the controlled thing explicit
+
+Start and stop UI should always name the exact entry being controlled.
+
+Examples:
+
+- `Start web`
+- `Stop api`
+- `Run db:migrate`
+
+Avoid generic workspace-level labels when multiple commands exist.
+
+### 4. Separate services from jobs in the UI
+
+Do not mix one-shot jobs and long-running services into one undifferentiated list.
+
+Recommended sections:
+
+- `Services`
+- `Jobs`
+- `Unsupported imported tasks`
+
+That resolves the ambiguity called out in the issue.
+
+## Data Model Direction
+
+Do not replace `workspaceRuntime` immediately.
+Instead add a higher-level representation that can compile down to the existing runtime-service machinery.
+
+Suggested workspace metadata shape:
+
+```ts
+type WorkspaceCommandSource =
+  | { type: "paperclip" }
+  | { type: "vscode_task"; taskLabel: string; taskPath: ".vscode/tasks.json" };
+
+type WorkspaceCommandKind = "service" | "job";
+
+type WorkspaceCommandDefinition = {
+  id: string;
+  name: string;
+  kind: WorkspaceCommandKind;
+  source: WorkspaceCommandSource;
+  command: string | null;
+  cwd: string | null;
+  env?: Record<string, string> | null;
+  autoStart?: boolean;
+  serviceConfig?: {
+    lifecycle?: "shared" | "ephemeral";
+    reuseScope?: "project_workspace" | "execution_workspace" | "run";
+    readiness?: Record<string, unknown> | null;
+    expose?: Record<string, unknown> | null;
+  } | null;
+  importWarnings?: string[];
+  disabledReason?: string | null;
+};
+```
+
+`workspaceRuntime` can then become a derived or advanced representation for service-type commands until the rest of the system is migrated.
+
+## VS Code Mapping Rules
+
+Paperclip should map imported tasks with explicit, documented rules.
+
+Recommended rules:
+
+1. A task becomes a `job` by default.
+2. A task becomes a `service` only when:
+   - Paperclip metadata marks it as a service, or
+   - the task clearly represents a background/watch process and the operator confirms the classification.
+3. Unsupported tasks stay visible but disabled.
+4. Task labels become default command names.
+5. `dependsOn` is preserved as metadata, not silently flattened into hidden behavior.
+
+Paperclip-specific metadata can live in a namespaced field on the imported task definition, for example:
+
+```json
+{
+  "label": "web",
+  "type": "shell",
+  "command": "pnpm dev",
+  "isBackground": true,
+  "paperclip": {
+    "kind": "service",
+    "readiness": {
+      "type": "http",
+      "urlTemplate": "http://127.0.0.1:${port}"
+    },
+    "expose": {
+      "type": "url",
+      "urlTemplate": "http://127.0.0.1:${port}"
+    }
+  }
+}
+```
+
+That gives us interoperability without depending on VS Code-only semantics for service readiness and exposure.
+
+## Execution Policy
+
+Project workspaces should be the main place where imported commands are discovered and curated.
+Execution workspaces should inherit that curated command set by default, with optional issue-level overrides.
+
+Recommended precedence:
+
+1. execution workspace override
+2. project workspace command set
+3. imported VS Code tasks from the linked workspace
+4. advanced raw runtime fallback
+
+This matches the existing direction in `doc/plans/2026-03-10-workspace-strategy-and-git-worktrees.md`.
+
+## Implementation Plan
+
+### Phase 1: Discovery and read-only visibility
+
+Goal:
+show imported VS Code tasks in the workspace UI without changing runtime behavior.
+
+Work:
+
+- parse `.vscode/tasks.json` for project workspaces with local `cwd`
+- derive a list of candidate commands plus unsupported items
+- show source, label, command, cwd, and classification
+- show parse warnings and unsupported reasons
+
+Success condition:
+an operator can see what Paperclip would import and why.
+
+### Phase 2: Command model and explicit classification
+
+Goal:
+introduce a first-class workspace command layer above raw runtime JSON.
+
+Work:
+
+- add a persisted command definition model in workspace metadata or a dedicated table
+- allow operator edits to imported command classification
+- separate `service` and `job` in UI
+- keep existing runtime-service storage for live supervised processes
+
+Success condition:
+the workspace UI is command-first, and raw runtime JSON is advanced-only.
+
+### Phase 3: Service execution backed by existing runtime supervisor
+
+Goal:
+run supported imported service commands through the current Paperclip supervisor.
+
+Work:
+
+- compile service commands into the existing runtime service start/stop path
+- persist desired state per named command
+- keep startup restoration behavior for service commands
+- make the active command name explicit everywhere control actions appear
+
+Success condition:
+imported service commands behave like native Paperclip services once adopted.
+
+### Phase 4: Job execution and optional dependency handling
+
+Goal:
+support one-shot imported commands without pretending they are services.
+
+Work:
+
+- add `Run` actions for jobs
+- record output in workspace operations
+- optionally support simple `dependsOn` execution for jobs with clear logging
+
+Success condition:
+one-shot tasks are runnable, but they are not mixed into the service lifecycle model.
+
+### Phase 5: Adapter and execution workspace integration
+
+Goal:
+let agents and issue-scoped workspaces consume the curated command model consistently.
+
+Work:
+
+- expose inherited workspace commands to execution workspaces
+- allow issue-level selection of a default service command when relevant
+- make service selection explicit in issue and workspace views
+
+Success condition:
+agents, operators, and workspaces all refer to the same named commands.
+
+## Non-Goals
+
+- full VS Code task-runner parity
+- support for every VS Code task type
+- removal of Paperclip's own runtime supervision model
+- editor-dependent execution semantics inside the control plane
+
+## Risks
+
+- overfitting Paperclip to VS Code and making the model worse for non-VS-Code repos
+- misclassifying watch tasks as durable services
+- hiding too much detail and making debugging harder
+- allowing imported task graphs to become implicit magic
+
+These risks are manageable if the import layer stays explicit, conservative, and operator-editable.
+
+## Decision
+
+Paperclip should adopt VS Code tasks as an optional workspace command source, not as the canonical runtime model.
+
+The main UX change should be:
+
+- move from raw runtime JSON to named workspace commands
+- separate services from jobs
+- make the exact controlled command explicit
+- let `.vscode/tasks.json` pre-populate those commands when available
+
+## External References
+
+- VS Code tasks documentation: https://code.visualstudio.com/docs/debugtest/tasks
+- Existing Paperclip workspace plan: `doc/plans/2026-03-10-workspace-strategy-and-git-worktrees.md`