[codex] Add issue monitor liveness controls (#4988)

## Thinking Path > - Paperclip is a control plane for autonomous AI companies where work must stay observable, governable, and recoverable. > - The task/heartbeat subsystem owns agent execution continuity, issue state transitions, and visible recovery behavior. > - Waiting on an external service is not the same as being blocked when the assignee still owns a future check. > - The gap was that agents had no first-class one-shot monitor state for external-service waits, so recovery could look stalled or require ad hoc comments. > - This pull request adds bounded issue monitors that can wake the owner, clear exhausted waits, and produce explicit recovery behavior. > - It also surfaces monitor status in the board UI and documents when to use monitors versus `blocked`. > - The benefit is clearer liveness semantics for asynchronous waits without weakening single-assignee task ownership. ## What Changed - Added issue monitor fields, shared types, validators, constants, and an idempotent `0075` migration for scheduled monitor state. - Added server-side monitor scheduling, dispatch, recovery bounds, activity logging, and external-ref redaction. - Added board/agent route coverage for monitor permissions and child monitor scheduling. - Added issue detail/property UI for monitor state, a monitor activity card, and Storybook stories for review surfaces. - Documented monitor semantics and recovery policy behavior in `doc/execution-semantics.md`. - Addressed Greptile review feedback by preserving monitor state in skipped-stage builders and making board monitor saves send `scheduledBy: "board"`. ## Verification - `pnpm install --frozen-lockfile` - `pnpm run preflight:workspace-links && pnpm exec vitest run server/src/__tests__/issue-execution-policy-routes.test.ts server/src/__tests__/issue-execution-policy.test.ts server/src/__tests__/issue-monitor-scheduler.test.ts server/src/__tests__/recovery-classifiers.test.ts ui/src/components/IssueMonitorActivityCard.test.tsx ui/src/components/IssueProperties.test.tsx ui/src/lib/activity-format.test.ts` - First run passed 5 files and failed to collect 2 server suites because the worktree was missing the optional `acpx/runtime` dependency. - After `pnpm install --frozen-lockfile`, reran the 2 failed suites successfully. - `pnpm exec vitest run server/src/__tests__/issue-monitor-scheduler.test.ts server/src/__tests__/recovery-classifiers.test.ts` - `pnpm --filter @paperclipai/shared typecheck && pnpm --filter @paperclipai/db typecheck && pnpm --filter @paperclipai/server typecheck && pnpm --filter @paperclipai/ui typecheck` - `pnpm exec vitest run server/src/__tests__/issue-execution-policy.test.ts ui/src/components/IssueProperties.test.tsx` - `pnpm --filter @paperclipai/server typecheck && pnpm --filter @paperclipai/ui typecheck` - `pnpm exec vitest run ui/src/components/IssueMonitorActivityCard.test.tsx ui/src/components/IssueProperties.test.tsx` - `pnpm --filter @paperclipai/ui typecheck` - Storybook screenshot captured from `http://127.0.0.1:6006/iframe.html?viewMode=story&id=product-issue-monitor-surfaces--monitor-surfaces` with Playwright. ## Screenshots ![Issue monitor Storybook surfaces](https://raw.githubusercontent.com/paperclipai/paperclip/PAP-2945-when-a-task-is-waiting-for-an-_external-service_-what-state-should-it-be-in-and-what-recovery-method-could-it-h/docs/pr-screenshots/pap-2945/monitor-surfaces.png) ## Risks - Medium: this changes heartbeat recovery behavior for scheduled external-service waits, so regressions could affect wake timing or recovery issue creation. - Migration risk is reduced by using `IF NOT EXISTS` for the new issue monitor columns and index. - External monitor references are treated as secret-adjacent and are intentionally omitted from visible activity/wake payloads. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5 coding agent with repository tool use and terminal execution. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots or Storybook review surfaces - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-06-14 01:50:39 +09:00 · 2026-05-03 08:58:53 -05:00 · 2026-05-03 08:58:53 -05:00 · 57229d0f24
commit 57229d0f24
parent 76f09c8eb6
32 changed files with 19324 additions and 20 deletions
--- a/packages/shared/src/constants.ts
+++ b/packages/shared/src/constants.ts
@ -221,9 +221,39 @@ export type IssueExecutionPolicyMode = (typeof ISSUE_EXECUTION_POLICY_MODES)[num
 export const ISSUE_EXECUTION_STAGE_TYPES = ["review", "approval"] as const;
 export type IssueExecutionStageType = (typeof ISSUE_EXECUTION_STAGE_TYPES)[number];

+export const ISSUE_MONITOR_SCHEDULED_BY = ["assignee", "board"] as const;
+export type IssueMonitorScheduledBy = (typeof ISSUE_MONITOR_SCHEDULED_BY)[number];
+
+export const ISSUE_EXECUTION_MONITOR_KINDS = ["external_service"] as const;
+export type IssueExecutionMonitorKind = (typeof ISSUE_EXECUTION_MONITOR_KINDS)[number];
+
+export const ISSUE_EXECUTION_MONITOR_RECOVERY_POLICIES = [
+  "wake_owner",
+  "create_recovery_issue",
+  "escalate_to_board",
+] as const;
+export type IssueExecutionMonitorRecoveryPolicy =
+  (typeof ISSUE_EXECUTION_MONITOR_RECOVERY_POLICIES)[number];
+
 export const ISSUE_EXECUTION_STATE_STATUSES = ["idle", "pending", "changes_requested", "completed"] as const;
 export type IssueExecutionStateStatus = (typeof ISSUE_EXECUTION_STATE_STATUSES)[number];

+export const ISSUE_EXECUTION_MONITOR_STATE_STATUSES = ["scheduled", "triggered", "cleared"] as const;
+export type IssueExecutionMonitorStateStatus = (typeof ISSUE_EXECUTION_MONITOR_STATE_STATUSES)[number];
+
+export const ISSUE_EXECUTION_MONITOR_CLEAR_REASONS = [
+  "manual",
+  "triggered",
+  "done",
+  "cancelled",
+  "invalid_status",
+  "invalid_assignee",
+  "dispatch_skipped",
+  "timeout_exceeded",
+  "max_attempts_exhausted",
+] as const;
+export type IssueExecutionMonitorClearReason = (typeof ISSUE_EXECUTION_MONITOR_CLEAR_REASONS)[number];
+
 export const ISSUE_EXECUTION_DECISION_OUTCOMES = ["approved", "changes_requested"] as const;
 export type IssueExecutionDecisionOutcome = (typeof ISSUE_EXECUTION_DECISION_OUTCOMES)[number];

--- a/packages/shared/src/index.ts
+++ b/packages/shared/src/index.ts
@ -35,7 +35,12 @@ export {
  ISSUE_REFERENCE_SOURCE_KINDS,
  ISSUE_EXECUTION_POLICY_MODES,
  ISSUE_EXECUTION_STAGE_TYPES,
+  ISSUE_MONITOR_SCHEDULED_BY,
+  ISSUE_EXECUTION_MONITOR_KINDS,
+  ISSUE_EXECUTION_MONITOR_RECOVERY_POLICIES,
  ISSUE_EXECUTION_STATE_STATUSES,
+  ISSUE_EXECUTION_MONITOR_STATE_STATUSES,
+  ISSUE_EXECUTION_MONITOR_CLEAR_REASONS,
  ISSUE_EXECUTION_DECISION_OUTCOMES,
  GOAL_LEVELS,
  GOAL_STATUSES,
@ -136,7 +141,12 @@ export {
  type IssueReferenceSourceKind,
  type IssueExecutionPolicyMode,
  type IssueExecutionStageType,
+  type IssueMonitorScheduledBy,
+  type IssueExecutionMonitorKind,
+  type IssueExecutionMonitorRecoveryPolicy,
  type IssueExecutionStateStatus,
+  type IssueExecutionMonitorStateStatus,
+  type IssueExecutionMonitorClearReason,
  type IssueExecutionDecisionOutcome,
  type GoalLevel,
  type GoalStatus,
@ -340,6 +350,8 @@ export type {
  IssueReferenceSource,
  IssueRelatedWorkItem,
  IssueRelatedWorkSummary,
+  IssueExecutionMonitorPolicy,
+  IssueExecutionMonitorState,
  IssueRelation,
  IssueRelationIssueSummary,
  IssueExecutionPolicy,
--- a/packages/shared/src/types/index.ts
+++ b/packages/shared/src/types/index.ts
@ -145,6 +145,8 @@ export type {
  IssueRelatedWorkSummary,
  IssueRelation,
  IssueRelationIssueSummary,
+  IssueExecutionMonitorPolicy,
+  IssueExecutionMonitorState,
  IssueExecutionPolicy,
  IssueExecutionState,
  IssueExecutionStage,
--- a/packages/shared/src/types/issue.ts
+++ b/packages/shared/src/types/issue.ts
@ -1,5 +1,10 @@
 import type {
+  IssueExecutionMonitorClearReason,
+  IssueExecutionMonitorKind,
+  IssueExecutionMonitorRecoveryPolicy,
+  IssueExecutionMonitorStateStatus,
  IssueExecutionDecisionOutcome,
+  IssueMonitorScheduledBy,
  IssueExecutionPolicyMode,
  IssueReferenceSourceKind,
  IssueExecutionStageType,
@ -201,10 +206,40 @@ export interface IssueExecutionStage {
  participants: IssueExecutionStageParticipant[];
 }

+export interface IssueExecutionMonitorPolicy {
+  nextCheckAt: string;
+  notes: string | null;
+  scheduledBy: IssueMonitorScheduledBy;
+  kind?: IssueExecutionMonitorKind | null;
+  serviceName?: string | null;
+  externalRef?: string | null;
+  timeoutAt?: string | null;
+  maxAttempts?: number | null;
+  recoveryPolicy?: IssueExecutionMonitorRecoveryPolicy | null;
+}
+
 export interface IssueExecutionPolicy {
  mode: IssueExecutionPolicyMode;
  commentRequired: boolean;
  stages: IssueExecutionStage[];
+  monitor?: IssueExecutionMonitorPolicy | null;
+}
+
+export interface IssueExecutionMonitorState {
+  status: IssueExecutionMonitorStateStatus;
+  nextCheckAt: string | null;
+  lastTriggeredAt: string | null;
+  attemptCount: number;
+  notes: string | null;
+  scheduledBy: IssueMonitorScheduledBy | null;
+  kind?: IssueExecutionMonitorKind | null;
+  serviceName?: string | null;
+  externalRef?: string | null;
+  timeoutAt?: string | null;
+  maxAttempts?: number | null;
+  recoveryPolicy?: IssueExecutionMonitorRecoveryPolicy | null;
+  clearedAt: string | null;
+  clearReason: IssueExecutionMonitorClearReason | null;
 }

 export interface IssueReviewRequest {
@ -222,6 +257,7 @@ export interface IssueExecutionState {
  completedStageIds: string[];
  lastDecisionId: string | null;
  lastDecisionOutcome: IssueExecutionDecisionOutcome | null;
+  monitor?: IssueExecutionMonitorState | null;
 }

 export interface IssueExecutionDecision {
@ -270,6 +306,11 @@ export interface Issue {
  assigneeAdapterOverrides: IssueAssigneeAdapterOverrides | null;
  executionPolicy?: IssueExecutionPolicy | null;
  executionState?: IssueExecutionState | null;
+  monitorNextCheckAt?: Date | null;
+  monitorLastTriggeredAt?: Date | null;
+  monitorAttemptCount?: number;
+  monitorNotes?: string | null;
+  monitorScheduledBy?: IssueMonitorScheduledBy | null;
  executionWorkspaceId: string | null;
  executionWorkspacePreference: string | null;
  executionWorkspaceSettings: IssueExecutionWorkspaceSettings | null;
--- a/packages/shared/src/validators/issue.ts
+++ b/packages/shared/src/validators/issue.ts
@ -1,9 +1,14 @@
 import { z } from "zod";
 import {
  ISSUE_EXECUTION_DECISION_OUTCOMES,
+  ISSUE_EXECUTION_MONITOR_CLEAR_REASONS,
+  ISSUE_EXECUTION_MONITOR_KINDS,
+  ISSUE_EXECUTION_MONITOR_RECOVERY_POLICIES,
+  ISSUE_EXECUTION_MONITOR_STATE_STATUSES,
  ISSUE_EXECUTION_POLICY_MODES,
  ISSUE_EXECUTION_STAGE_TYPES,
  ISSUE_EXECUTION_STATE_STATUSES,
+  ISSUE_MONITOR_SCHEDULED_BY,
  ISSUE_PRIORITIES,
  clampIssueRequestDepth,
  ISSUE_STATUSES,
@ -103,10 +108,40 @@ export const issueExecutionStageSchema = z.object({
  participants: z.array(issueExecutionStageParticipantSchema).default([]),
 });

+export const issueExecutionMonitorPolicySchema = z.object({
+  nextCheckAt: z.string().datetime(),
+  notes: z.string().max(500).optional().nullable().default(null),
+  scheduledBy: z.enum(ISSUE_MONITOR_SCHEDULED_BY).optional().default("assignee"),
+  kind: z.enum(ISSUE_EXECUTION_MONITOR_KINDS).optional().nullable().default(null),
+  serviceName: z.string().trim().min(1).max(120).optional().nullable().default(null),
+  externalRef: z.string().trim().min(1).max(500).optional().nullable().default(null),
+  timeoutAt: z.string().datetime().optional().nullable().default(null),
+  maxAttempts: z.number().int().positive().max(100).optional().nullable().default(null),
+  recoveryPolicy: z.enum(ISSUE_EXECUTION_MONITOR_RECOVERY_POLICIES).optional().nullable().default(null),
+});
+
 export const issueExecutionPolicySchema = z.object({
  mode: z.enum(ISSUE_EXECUTION_POLICY_MODES).optional().default("normal"),
  commentRequired: z.boolean().optional().default(true),
  stages: z.array(issueExecutionStageSchema).default([]),
+  monitor: issueExecutionMonitorPolicySchema.optional().nullable(),
+});
+
+export const issueExecutionMonitorStateSchema = z.object({
+  status: z.enum(ISSUE_EXECUTION_MONITOR_STATE_STATUSES),
+  nextCheckAt: z.string().datetime().nullable(),
+  lastTriggeredAt: z.string().datetime().nullable(),
+  attemptCount: z.number().int().nonnegative().default(0),
+  notes: z.string().max(500).nullable(),
+  scheduledBy: z.enum(ISSUE_MONITOR_SCHEDULED_BY).nullable(),
+  kind: z.enum(ISSUE_EXECUTION_MONITOR_KINDS).nullable().optional().default(null),
+  serviceName: z.string().trim().min(1).max(120).nullable().optional().default(null),
+  externalRef: z.string().trim().min(1).max(500).nullable().optional().default(null),
+  timeoutAt: z.string().datetime().nullable().optional().default(null),
+  maxAttempts: z.number().int().positive().max(100).nullable().optional().default(null),
+  recoveryPolicy: z.enum(ISSUE_EXECUTION_MONITOR_RECOVERY_POLICIES).nullable().optional().default(null),
+  clearedAt: z.string().datetime().nullable(),
+  clearReason: z.enum(ISSUE_EXECUTION_MONITOR_CLEAR_REASONS).nullable(),
 });

 export const issueReviewRequestSchema = z.object({
@ -124,6 +159,7 @@ export const issueExecutionStateSchema = z.object({
  completedStageIds: z.array(z.string().uuid()).default([]),
  lastDecisionId: z.string().uuid().nullable(),
  lastDecisionOutcome: z.enum(ISSUE_EXECUTION_DECISION_OUTCOMES).nullable(),
+  monitor: issueExecutionMonitorStateSchema.optional().nullable(),
 });

 const issueRequestDepthInputSchema = z