[codex] Roll up May 17 branch changes (#6210)

## Thinking Path > - Paperclip is the control plane for autonomous AI companies, so agent work needs visible ownership, recovery, and operator controls. > - This local branch had accumulated several related control-plane reliability and operator-experience fixes across recovery actions, watchdog folding, model-profile defaults, mentions, markdown editing, plugin launchers, and small UI polish. > - The branch needed to be converted into a PR against the current `origin/master` without losing dirty work or including lockfile/workflow churn. > - The safest standalone shape is a single rollup PR because the recovery/server/UI files overlap heavily across the local commits and splitting would create avoidable conflicts. > - This pull request replays the local branch onto latest `origin/master`, preserves the uncommitted work as logical commits, and adds a Zod 4 validator compatibility fix found during verification. > - The benefit is that the May 17 local branch can be reviewed and merged as one coherent, conflict-free branch under the 100-file Greptile limit. ## What Changed - Rebased the local May 17 branch work onto current `origin/master` in a dedicated worktree. - Preserved and committed previously dirty changes for recovery retry handling, plugin/sidebar launcher polish, and `.herenow` ignores. - Added recovery-action behavior for returning source issues to `todo` when retrying source-scoped recovery. - Included the existing local recovery/liveness/watchdog fold, Codex cheap-profile, markdown/mention, duplicate-agent, and UI polish commits from the branch. - Normalized shared validator `z.record(...)` schemas to explicit string-key records for Zod 4 compatibility. - Confirmed the PR has no `pnpm-lock.yaml` or `.github/workflows/*` changes and stays below the 100-file Greptile limit. ## Verification - `pnpm install --frozen-lockfile --ignore-scripts` - `npm run install` in `node_modules/.pnpm/sqlite3@5.1.7/node_modules/sqlite3` to build the local native sqlite3 binding after installing with scripts disabled - `pnpm exec vitest run packages/shared/src/validators/issue.test.ts packages/shared/src/project-mentions.test.ts packages/adapter-utils/src/server-utils.test.ts server/src/__tests__/heartbeat-model-profile.test.ts server/src/__tests__/issue-recovery-actions.test.ts server/src/__tests__/issue-agent-mutation-ownership-routes.test.ts server/src/__tests__/heartbeat-active-run-output-watchdog.test.ts server/src/__tests__/plugin-local-folders.test.ts ui/src/components/IssueRecoveryActionCard.test.tsx ui/src/components/Sidebar.test.tsx ui/src/components/SidebarAccountMenu.test.tsx ui/src/components/IssueProperties.test.tsx ui/src/components/MarkdownEditor.test.tsx ui/src/components/MarkdownBody.test.tsx ui/src/lib/duplicate-agent-payload.test.ts ui/src/pages/Routines.test.tsx` - First pass: 13 files passed with 201 passing tests; 3 server files failed before sqlite3 native binding was built. - After rebuilding sqlite3: `server/src/__tests__/heartbeat-model-profile.test.ts`, `server/src/__tests__/issue-recovery-actions.test.ts`, and `server/src/__tests__/heartbeat-active-run-output-watchdog.test.ts` passed/loaded; embedded Postgres tests were skipped by the local host guard. - `pnpm --filter @paperclipai/shared typecheck` - `pnpm --filter @paperclipai/adapter-utils typecheck` - `pnpm --filter @paperclipai/server typecheck` - `pnpm --filter @paperclipai/ui typecheck` ## Risks - Medium risk: this is a broad rollup PR across recovery semantics, server tests, shared validators, and UI surfaces. - Some embedded Postgres tests skipped locally due the host guard, so CI should provide the stronger database-backed signal. - UI changes were covered by component tests, but no browser screenshot was captured in this PR creation pass. - This branch may overlap with existing recovery/liveness PR work; merge this PR independently or restack/close overlapping branches rather than merging duplicate implementations together. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5-based coding agent, tool-enabled local repository and GitHub workflow, medium reasoning effort. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-06-14 01:50:39 +09:00 · 2026-05-17 17:15:06 -05:00 · 2026-05-17 17:15:06 -05:00 · d734bd43d1
commit d734bd43d1
parent 705c1b8d81
83 changed files with 3675 additions and 180 deletions
--- a/doc/execution-semantics.md
+++ b/doc/execution-semantics.md
@ -184,7 +184,7 @@ A valid recovery action must name:
 - the wake, monitor, timeout, retry, or escalation policy that will move the action forward
 - the resolution outcome when closed, such as restored, delegated, false positive, blocked, escalated, or cancelled

-A source-scoped recovery action is the default form. Use it when the next safe move is to repair the source issue's liveness directly: restore a wake path, clarify disposition, re-establish a monitor, record a false positive, or delegate real follow-up work from the source issue.
+A source-scoped recovery action is the default form. Use it when the next safe move is to repair the source issue's liveness directly: move the source issue back to `todo` so it can be retried, clarify disposition, re-establish a monitor, record a false positive, or delegate real follow-up work from the source issue.

 Use an issue-backed recovery action only when the recovery is genuinely independent work or when source-scoped handling would be unsafe or unclear. Examples include:

@ -196,6 +196,14 @@ Use an issue-backed recovery action only when the recovery is genuinely independ

 A comment or system notice can be evidence for a recovery action, but it is not a recovery action by itself. Comment-only recovery is not a healthy liveness path because it does not define a typed owner, wake or monitor policy, retry bound, timeout, escalation path, or resolution outcome.

+#### Recovery action freshness
+
+Source-scoped recovery actions are snapshots of the source issue's liveness state at the time the action was opened. They must be revalidated after newer durable source activity, including source issue status changes, assignee changes, blocker changes, execution policy or monitor changes, document or work-product updates that define a valid waiting path, and structured resume or disposition updates.
+
+When newer source activity restores a valid live or waiting path, the recovery action is stale and should be folded through the explicit recovery lifecycle instead of being hidden or deleted. Folding means resolving or cancelling the recovery action with a resolution outcome and note that preserve the audit trail.
+
+Plain comments alone do not make a recovery action stale. A comment can provide evidence, but the recovery action should remain visible when the source issue is still stalled and the comment does not create a valid action-path primitive such as a wake, monitor, interaction, approval, blocker, human owner, execution participant, terminal disposition, or delegated follow-up.
+
 ### Agent-assigned `todo`

 This is dispatch state: ready to start, not yet actively claimed.
@ -326,14 +334,15 @@ This is an active-work continuity recovery.

 Startup recovery and periodic recovery are different from normal wakeup delivery.

-On startup and on the periodic recovery loop, Paperclip now does four things in sequence:
+On startup and on the periodic recovery loop, Paperclip now does five things in sequence:

 1. reap orphaned `running` runs
 2. resume persisted `queued` runs
 3. reconcile stranded assigned work
-4. scan silent active runs and create or update explicit watchdog recovery actions
+4. scan silent active runs, revalidate their source issues, and either fold source-resolved watchdogs or create/update explicit watchdog recovery actions
+5. reconcile productivity reviews

-The stranded-work pass closes the gap where issue state survives a crash but the wake/run path does not. The silent-run scan covers the separate case where a live process exists but has stopped producing observable output.
+The stranded-work pass closes the gap where issue state survives a crash but the wake/run path does not. The silent-run scan covers the separate case where a live process exists but has stopped producing observable output. The productivity-review pass is later and separate; it reviews unusual progression patterns on assigned source issues, not stale run handles after a source issue already has a valid disposition.

 ## 10. Silent Active-Run Watchdog

@ -360,6 +369,33 @@ Operators should prefer `snooze` for known time-bounded quiet periods. `continue

 The board can record watchdog decisions. The assigned owner of an issue-backed watchdog evaluation can also record them. Other agents cannot.

+### Source-aware watchdog folding
+
+Active-run watchdog work is source-aware. Before the watchdog creates, refreshes, escalates, or blocks on reviewer work, it must re-read the linked source issue and decide whether the watchdog signal is still about productive source work or only about stale run/process bookkeeping.
+
+Fold watchdog work when all of these are true:
+
+- the run is linked to a source issue in the same company
+- the source issue is terminal (`done` or `cancelled`)
+- durable source activity from the same run proves the source issue reached that terminal disposition after the stale-run or output-silence evidence point
+- there is no independent evidence that the still-running or detached process is doing harmful work, still owns external cleanup that needs an operator decision, or needs a separate security/ownership review
+
+Folding means resolving or cancelling the watchdog recovery action or issue-backed evaluation through the explicit recovery lifecycle. It must preserve the run id, source issue, detected silence or detached-process evidence, terminal source activity, decision reason, and best-effort process cleanup result. It must be idempotent for the `(companyId, runId, sourceIssueId)` signal and must not recursively recover the watchdog evaluation issue itself.
+
+Do not fold watchdog work only because the run is quiet. The watchdog must still create or continue reviewer work when:
+
+- the source issue is still `todo` or `in_progress`, because productive work may still be happening or stuck
+- the source issue remains `in_progress` after a successful run with no valid disposition, because the successful-run handoff path owns that bounded correction
+- the run terminated or disappeared while the source issue remains `in_progress` without a live path, because stranded assigned recovery owns that continuity repair
+- the source issue is terminal but there is no durable same-run terminal activity after the stale evidence point
+- there is independent evidence that the process may still be mutating external state, leaking resources, crossing company or ownership boundaries, or otherwise needs operator review
+
+In the normal non-terminal case, critical silence can still create issue-backed evaluation work and block the source issue when blocking is necessary for correctness. In the source-resolved case, a completed source issue should not acquire a new manager review or blocker merely because an old run handle stayed active; only real unresolved work should block work.
+
+This is distinct from productivity review. Productivity review asks whether an assigned source issue has unusual progression patterns, such as no-comment terminal-run streaks, long active duration, or high churn. Source-resolved watchdog folding asks whether a stale active-run signal outlived a source issue that already reached a valid terminal disposition. One does not substitute for the other.
+
+Detached process cleanup is operational hygiene, not source issue liveness. Cleanup should be best-effort and auditable. If cleanup fails but the source issue is already terminal with same-run durable evidence, Paperclip should preserve the cleanup failure on the run/watchdog audit trail and route only the cleanup concern to bounded recovery when a real owner/action remains.
+
 ## 11. Auto-Recover vs Explicit Recovery vs Human Escalation

 Paperclip uses three different recovery outcomes, depending on how much it can safely infer.