[codex] Add issue monitor liveness controls (#4988)

## Thinking Path

> - Paperclip is a control plane for autonomous AI companies where work
must stay observable, governable, and recoverable.
> - The task/heartbeat subsystem owns agent execution continuity, issue
state transitions, and visible recovery behavior.
> - Waiting on an external service is not the same as being blocked when
the assignee still owns a future check.
> - The gap was that agents had no first-class one-shot monitor state
for external-service waits, so recovery could look stalled or require ad
hoc comments.
> - This pull request adds bounded issue monitors that can wake the
owner, clear exhausted waits, and produce explicit recovery behavior.
> - It also surfaces monitor status in the board UI and documents when
to use monitors versus `blocked`.
> - The benefit is clearer liveness semantics for asynchronous waits
without weakening single-assignee task ownership.

## What Changed

- Added issue monitor fields, shared types, validators, constants, and
an idempotent `0075` migration for scheduled monitor state.
- Added server-side monitor scheduling, dispatch, recovery bounds,
activity logging, and external-ref redaction.
- Added board/agent route coverage for monitor permissions and child
monitor scheduling.
- Added issue detail/property UI for monitor state, a monitor activity
card, and Storybook stories for review surfaces.
- Documented monitor semantics and recovery policy behavior in
`doc/execution-semantics.md`.
- Addressed Greptile review feedback by preserving monitor state in
skipped-stage builders and making board monitor saves send `scheduledBy:
"board"`.

## Verification

- `pnpm install --frozen-lockfile`
- `pnpm run preflight:workspace-links && pnpm exec vitest run
server/src/__tests__/issue-execution-policy-routes.test.ts
server/src/__tests__/issue-execution-policy.test.ts
server/src/__tests__/issue-monitor-scheduler.test.ts
server/src/__tests__/recovery-classifiers.test.ts
ui/src/components/IssueMonitorActivityCard.test.tsx
ui/src/components/IssueProperties.test.tsx
ui/src/lib/activity-format.test.ts`
- First run passed 5 files and failed to collect 2 server suites because
the worktree was missing the optional `acpx/runtime` dependency.
- After `pnpm install --frozen-lockfile`, reran the 2 failed suites
successfully.
- `pnpm exec vitest run
server/src/__tests__/issue-monitor-scheduler.test.ts
server/src/__tests__/recovery-classifiers.test.ts`
- `pnpm --filter @paperclipai/shared typecheck && pnpm --filter
@paperclipai/db typecheck && pnpm --filter @paperclipai/server typecheck
&& pnpm --filter @paperclipai/ui typecheck`
- `pnpm exec vitest run
server/src/__tests__/issue-execution-policy.test.ts
ui/src/components/IssueProperties.test.tsx`
- `pnpm --filter @paperclipai/server typecheck && pnpm --filter
@paperclipai/ui typecheck`
- `pnpm exec vitest run
ui/src/components/IssueMonitorActivityCard.test.tsx
ui/src/components/IssueProperties.test.tsx`
- `pnpm --filter @paperclipai/ui typecheck`
- Storybook screenshot captured from
`http://127.0.0.1:6006/iframe.html?viewMode=story&id=product-issue-monitor-surfaces--monitor-surfaces`
with Playwright.

## Screenshots

![Issue monitor Storybook
surfaces](https://raw.githubusercontent.com/paperclipai/paperclip/PAP-2945-when-a-task-is-waiting-for-an-_external-service_-what-state-should-it-be-in-and-what-recovery-method-could-it-h/docs/pr-screenshots/pap-2945/monitor-surfaces.png)

## Risks

- Medium: this changes heartbeat recovery behavior for scheduled
external-service waits, so regressions could affect wake timing or
recovery issue creation.
- Migration risk is reduced by using `IF NOT EXISTS` for the new issue
monitor columns and index.
- External monitor references are treated as secret-adjacent and are
intentionally omitted from visible activity/wake payloads.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent with repository tool use and terminal
execution.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots or Storybook review surfaces
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
This commit is contained in:
Dotta 2026-05-03 08:58:53 -05:00 committed by GitHub
parent 76f09c8eb6
commit 57229d0f24
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
32 changed files with 19324 additions and 20 deletions

View file

@ -221,9 +221,39 @@ export type IssueExecutionPolicyMode = (typeof ISSUE_EXECUTION_POLICY_MODES)[num
export const ISSUE_EXECUTION_STAGE_TYPES = ["review", "approval"] as const;
export type IssueExecutionStageType = (typeof ISSUE_EXECUTION_STAGE_TYPES)[number];
export const ISSUE_MONITOR_SCHEDULED_BY = ["assignee", "board"] as const;
export type IssueMonitorScheduledBy = (typeof ISSUE_MONITOR_SCHEDULED_BY)[number];
export const ISSUE_EXECUTION_MONITOR_KINDS = ["external_service"] as const;
export type IssueExecutionMonitorKind = (typeof ISSUE_EXECUTION_MONITOR_KINDS)[number];
export const ISSUE_EXECUTION_MONITOR_RECOVERY_POLICIES = [
"wake_owner",
"create_recovery_issue",
"escalate_to_board",
] as const;
export type IssueExecutionMonitorRecoveryPolicy =
(typeof ISSUE_EXECUTION_MONITOR_RECOVERY_POLICIES)[number];
export const ISSUE_EXECUTION_STATE_STATUSES = ["idle", "pending", "changes_requested", "completed"] as const;
export type IssueExecutionStateStatus = (typeof ISSUE_EXECUTION_STATE_STATUSES)[number];
export const ISSUE_EXECUTION_MONITOR_STATE_STATUSES = ["scheduled", "triggered", "cleared"] as const;
export type IssueExecutionMonitorStateStatus = (typeof ISSUE_EXECUTION_MONITOR_STATE_STATUSES)[number];
export const ISSUE_EXECUTION_MONITOR_CLEAR_REASONS = [
"manual",
"triggered",
"done",
"cancelled",
"invalid_status",
"invalid_assignee",
"dispatch_skipped",
"timeout_exceeded",
"max_attempts_exhausted",
] as const;
export type IssueExecutionMonitorClearReason = (typeof ISSUE_EXECUTION_MONITOR_CLEAR_REASONS)[number];
export const ISSUE_EXECUTION_DECISION_OUTCOMES = ["approved", "changes_requested"] as const;
export type IssueExecutionDecisionOutcome = (typeof ISSUE_EXECUTION_DECISION_OUTCOMES)[number];

View file

@ -35,7 +35,12 @@ export {
ISSUE_REFERENCE_SOURCE_KINDS,
ISSUE_EXECUTION_POLICY_MODES,
ISSUE_EXECUTION_STAGE_TYPES,
ISSUE_MONITOR_SCHEDULED_BY,
ISSUE_EXECUTION_MONITOR_KINDS,
ISSUE_EXECUTION_MONITOR_RECOVERY_POLICIES,
ISSUE_EXECUTION_STATE_STATUSES,
ISSUE_EXECUTION_MONITOR_STATE_STATUSES,
ISSUE_EXECUTION_MONITOR_CLEAR_REASONS,
ISSUE_EXECUTION_DECISION_OUTCOMES,
GOAL_LEVELS,
GOAL_STATUSES,
@ -136,7 +141,12 @@ export {
type IssueReferenceSourceKind,
type IssueExecutionPolicyMode,
type IssueExecutionStageType,
type IssueMonitorScheduledBy,
type IssueExecutionMonitorKind,
type IssueExecutionMonitorRecoveryPolicy,
type IssueExecutionStateStatus,
type IssueExecutionMonitorStateStatus,
type IssueExecutionMonitorClearReason,
type IssueExecutionDecisionOutcome,
type GoalLevel,
type GoalStatus,
@ -340,6 +350,8 @@ export type {
IssueReferenceSource,
IssueRelatedWorkItem,
IssueRelatedWorkSummary,
IssueExecutionMonitorPolicy,
IssueExecutionMonitorState,
IssueRelation,
IssueRelationIssueSummary,
IssueExecutionPolicy,

View file

@ -145,6 +145,8 @@ export type {
IssueRelatedWorkSummary,
IssueRelation,
IssueRelationIssueSummary,
IssueExecutionMonitorPolicy,
IssueExecutionMonitorState,
IssueExecutionPolicy,
IssueExecutionState,
IssueExecutionStage,

View file

@ -1,5 +1,10 @@
import type {
IssueExecutionMonitorClearReason,
IssueExecutionMonitorKind,
IssueExecutionMonitorRecoveryPolicy,
IssueExecutionMonitorStateStatus,
IssueExecutionDecisionOutcome,
IssueMonitorScheduledBy,
IssueExecutionPolicyMode,
IssueReferenceSourceKind,
IssueExecutionStageType,
@ -201,10 +206,40 @@ export interface IssueExecutionStage {
participants: IssueExecutionStageParticipant[];
}
export interface IssueExecutionMonitorPolicy {
nextCheckAt: string;
notes: string | null;
scheduledBy: IssueMonitorScheduledBy;
kind?: IssueExecutionMonitorKind | null;
serviceName?: string | null;
externalRef?: string | null;
timeoutAt?: string | null;
maxAttempts?: number | null;
recoveryPolicy?: IssueExecutionMonitorRecoveryPolicy | null;
}
export interface IssueExecutionPolicy {
mode: IssueExecutionPolicyMode;
commentRequired: boolean;
stages: IssueExecutionStage[];
monitor?: IssueExecutionMonitorPolicy | null;
}
export interface IssueExecutionMonitorState {
status: IssueExecutionMonitorStateStatus;
nextCheckAt: string | null;
lastTriggeredAt: string | null;
attemptCount: number;
notes: string | null;
scheduledBy: IssueMonitorScheduledBy | null;
kind?: IssueExecutionMonitorKind | null;
serviceName?: string | null;
externalRef?: string | null;
timeoutAt?: string | null;
maxAttempts?: number | null;
recoveryPolicy?: IssueExecutionMonitorRecoveryPolicy | null;
clearedAt: string | null;
clearReason: IssueExecutionMonitorClearReason | null;
}
export interface IssueReviewRequest {
@ -222,6 +257,7 @@ export interface IssueExecutionState {
completedStageIds: string[];
lastDecisionId: string | null;
lastDecisionOutcome: IssueExecutionDecisionOutcome | null;
monitor?: IssueExecutionMonitorState | null;
}
export interface IssueExecutionDecision {
@ -270,6 +306,11 @@ export interface Issue {
assigneeAdapterOverrides: IssueAssigneeAdapterOverrides | null;
executionPolicy?: IssueExecutionPolicy | null;
executionState?: IssueExecutionState | null;
monitorNextCheckAt?: Date | null;
monitorLastTriggeredAt?: Date | null;
monitorAttemptCount?: number;
monitorNotes?: string | null;
monitorScheduledBy?: IssueMonitorScheduledBy | null;
executionWorkspaceId: string | null;
executionWorkspacePreference: string | null;
executionWorkspaceSettings: IssueExecutionWorkspaceSettings | null;

View file

@ -1,9 +1,14 @@
import { z } from "zod";
import {
ISSUE_EXECUTION_DECISION_OUTCOMES,
ISSUE_EXECUTION_MONITOR_CLEAR_REASONS,
ISSUE_EXECUTION_MONITOR_KINDS,
ISSUE_EXECUTION_MONITOR_RECOVERY_POLICIES,
ISSUE_EXECUTION_MONITOR_STATE_STATUSES,
ISSUE_EXECUTION_POLICY_MODES,
ISSUE_EXECUTION_STAGE_TYPES,
ISSUE_EXECUTION_STATE_STATUSES,
ISSUE_MONITOR_SCHEDULED_BY,
ISSUE_PRIORITIES,
clampIssueRequestDepth,
ISSUE_STATUSES,
@ -103,10 +108,40 @@ export const issueExecutionStageSchema = z.object({
participants: z.array(issueExecutionStageParticipantSchema).default([]),
});
export const issueExecutionMonitorPolicySchema = z.object({
nextCheckAt: z.string().datetime(),
notes: z.string().max(500).optional().nullable().default(null),
scheduledBy: z.enum(ISSUE_MONITOR_SCHEDULED_BY).optional().default("assignee"),
kind: z.enum(ISSUE_EXECUTION_MONITOR_KINDS).optional().nullable().default(null),
serviceName: z.string().trim().min(1).max(120).optional().nullable().default(null),
externalRef: z.string().trim().min(1).max(500).optional().nullable().default(null),
timeoutAt: z.string().datetime().optional().nullable().default(null),
maxAttempts: z.number().int().positive().max(100).optional().nullable().default(null),
recoveryPolicy: z.enum(ISSUE_EXECUTION_MONITOR_RECOVERY_POLICIES).optional().nullable().default(null),
});
export const issueExecutionPolicySchema = z.object({
mode: z.enum(ISSUE_EXECUTION_POLICY_MODES).optional().default("normal"),
commentRequired: z.boolean().optional().default(true),
stages: z.array(issueExecutionStageSchema).default([]),
monitor: issueExecutionMonitorPolicySchema.optional().nullable(),
});
export const issueExecutionMonitorStateSchema = z.object({
status: z.enum(ISSUE_EXECUTION_MONITOR_STATE_STATUSES),
nextCheckAt: z.string().datetime().nullable(),
lastTriggeredAt: z.string().datetime().nullable(),
attemptCount: z.number().int().nonnegative().default(0),
notes: z.string().max(500).nullable(),
scheduledBy: z.enum(ISSUE_MONITOR_SCHEDULED_BY).nullable(),
kind: z.enum(ISSUE_EXECUTION_MONITOR_KINDS).nullable().optional().default(null),
serviceName: z.string().trim().min(1).max(120).nullable().optional().default(null),
externalRef: z.string().trim().min(1).max(500).nullable().optional().default(null),
timeoutAt: z.string().datetime().nullable().optional().default(null),
maxAttempts: z.number().int().positive().max(100).nullable().optional().default(null),
recoveryPolicy: z.enum(ISSUE_EXECUTION_MONITOR_RECOVERY_POLICIES).nullable().optional().default(null),
clearedAt: z.string().datetime().nullable(),
clearReason: z.enum(ISSUE_EXECUTION_MONITOR_CLEAR_REASONS).nullable(),
});
export const issueReviewRequestSchema = z.object({
@ -124,6 +159,7 @@ export const issueExecutionStateSchema = z.object({
completedStageIds: z.array(z.string().uuid()).default([]),
lastDecisionId: z.string().uuid().nullable(),
lastDecisionOutcome: z.enum(ISSUE_EXECUTION_DECISION_OUTCOMES).nullable(),
monitor: issueExecutionMonitorStateSchema.optional().nullable(),
});
const issueRequestDepthInputSchema = z