mirror of
https://github.com/alkimake/paperclip.git
synced 2026-06-18 03:30:39 +09:00
[codex] Add run liveness continuations (#4083)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies. > - Heartbeat runs are the control-plane record of each agent execution window. > - Long-running local agents can exhaust context or stop while still holding useful next-step state. > - Operators need that stop reason, next action, and continuation path to be durable and visible. > - This pull request adds run liveness metadata, continuation summaries, and UI surfaces for issue run ledgers. > - The benefit is that interrupted or long-running work can resume with clearer context instead of losing the agent's last useful handoff. ## What Changed - Added heartbeat-run liveness fields, continuation attempt tracking, and an idempotent `0058` migration. - Added server services and tests for run liveness, continuation summaries, stop metadata, and activity backfill. - Wired local and HTTP adapters to surface continuation/liveness context through shared adapter utilities. - Added shared constants, validators, and heartbeat types for liveness continuation state. - Added issue-detail UI surfaces for continuation handoffs and the run ledger, with component tests. - Updated agent runtime docs, heartbeat protocol docs, prompt guidance, onboarding assets, and skills instructions to explain continuation behavior. - Addressed Greptile feedback by scoping document evidence by run, excluding system continuation-summary documents from liveness evidence, importing shared liveness types, surfacing hidden ledger run counts, documenting bounded retry behavior, and moving run-ledger liveness backfill off the request path. ## Verification - `pnpm exec vitest run packages/adapter-utils/src/server-utils.test.ts server/src/__tests__/run-continuations.test.ts server/src/__tests__/run-liveness.test.ts server/src/__tests__/activity-service.test.ts server/src/__tests__/documents-service.test.ts server/src/__tests__/issue-continuation-summary.test.ts server/src/services/heartbeat-stop-metadata.test.ts ui/src/components/IssueRunLedger.test.tsx ui/src/components/IssueContinuationHandoff.test.tsx ui/src/components/IssueDocumentsSection.test.tsx` - `pnpm --filter @paperclipai/db build` - `pnpm exec vitest run server/src/__tests__/activity-service.test.ts ui/src/components/IssueRunLedger.test.tsx` - `pnpm --filter @paperclipai/ui typecheck` - `pnpm --filter @paperclipai/server typecheck` - `pnpm exec vitest run server/src/__tests__/activity-service.test.ts server/src/__tests__/run-continuations.test.ts ui/src/components/IssueRunLedger.test.tsx` - `pnpm exec vitest run server/src/__tests__/heartbeat-process-recovery.test.ts -t "treats a plan document update"` - `pnpm exec vitest run server/src/__tests__/activity-service.test.ts server/src/__tests__/heartbeat-process-recovery.test.ts -t "activity service|treats a plan document update"` - Remote PR checks on head `e53b1a1d`: `verify`, `e2e`, `policy`, and Snyk all passed. - Confirmed `public-gh/master` is an ancestor of this branch after fetching `public-gh master`. - Confirmed `pnpm-lock.yaml` is not included in the branch diff. - Confirmed migration `0058_wealthy_starbolt.sql` is ordered after `0057` and uses `IF NOT EXISTS` guards for repeat application. - Greptile inline review threads are resolved. ## Risks - Medium risk: this touches heartbeat execution, liveness recovery, activity rendering, issue routes, shared contracts, docs, and UI. - Migration risk is mitigated by additive columns/indexes and idempotent guards. - Run-ledger liveness backfill is now asynchronous, so the first ledger response can briefly show historical missing liveness until the background backfill completes. - UI screenshot coverage is not included in this packaging pass; validation is currently through focused component tests. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5.4, local tool-use coding agent with terminal, git, GitHub connector, GitHub CLI, and Paperclip API access. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge Screenshot note: no before/after screenshots were captured in this PR packaging pass; the UI changes are covered by focused component tests listed above. --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
This commit is contained in:
parent
b9a80dcf22
commit
236d11d36f
71 changed files with 18254 additions and 85 deletions
271
ui/src/components/IssueRunLedger.test.tsx
Normal file
271
ui/src/components/IssueRunLedger.test.tsx
Normal file
|
|
@ -0,0 +1,271 @@
|
|||
// @vitest-environment jsdom
|
||||
|
||||
import { act } from "react";
|
||||
import type { ComponentProps, ReactNode } from "react";
|
||||
import { createRoot, type Root } from "react-dom/client";
|
||||
import type { Issue, RunLivenessState } from "@paperclipai/shared";
|
||||
import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
|
||||
import type { RunForIssue } from "../api/activity";
|
||||
import { IssueRunLedgerContent } from "./IssueRunLedger";
|
||||
|
||||
vi.mock("@/lib/router", () => ({
|
||||
Link: ({ children, to, ...props }: { children: ReactNode; to: string } & ComponentProps<"a">) => (
|
||||
<a href={to} {...props}>{children}</a>
|
||||
),
|
||||
}));
|
||||
|
||||
// eslint-disable-next-line @typescript-eslint/no-explicit-any
|
||||
(globalThis as any).IS_REACT_ACT_ENVIRONMENT = true;
|
||||
|
||||
let container: HTMLDivElement;
|
||||
let root: Root;
|
||||
|
||||
beforeEach(() => {
|
||||
vi.useFakeTimers();
|
||||
vi.setSystemTime(new Date("2026-04-18T20:00:00.000Z"));
|
||||
container = document.createElement("div");
|
||||
document.body.appendChild(container);
|
||||
root = createRoot(container);
|
||||
});
|
||||
|
||||
afterEach(() => {
|
||||
act(() => root.unmount());
|
||||
container.remove();
|
||||
vi.useRealTimers();
|
||||
});
|
||||
|
||||
function render(ui: ReactNode) {
|
||||
act(() => {
|
||||
root.render(ui);
|
||||
});
|
||||
}
|
||||
|
||||
function createRun(overrides: Partial<RunForIssue> = {}): RunForIssue {
|
||||
return {
|
||||
runId: "run-00000000",
|
||||
status: "succeeded",
|
||||
agentId: "agent-1",
|
||||
adapterType: "codex_local",
|
||||
startedAt: "2026-04-18T19:58:00.000Z",
|
||||
finishedAt: "2026-04-18T19:59:00.000Z",
|
||||
createdAt: "2026-04-18T19:58:00.000Z",
|
||||
invocationSource: "assignment",
|
||||
usageJson: null,
|
||||
resultJson: null,
|
||||
livenessState: "advanced",
|
||||
livenessReason: "Run produced concrete action evidence: 2 activity event(s)",
|
||||
continuationAttempt: 0,
|
||||
lastUsefulActionAt: "2026-04-18T19:59:00.000Z",
|
||||
nextAction: null,
|
||||
...overrides,
|
||||
};
|
||||
}
|
||||
|
||||
function createIssue(overrides: Partial<Issue> = {}): Issue {
|
||||
return {
|
||||
id: "issue-1",
|
||||
companyId: "company-1",
|
||||
projectId: null,
|
||||
projectWorkspaceId: null,
|
||||
goalId: null,
|
||||
parentId: null,
|
||||
title: "Child issue",
|
||||
description: null,
|
||||
status: "todo",
|
||||
priority: "medium",
|
||||
assigneeAgentId: null,
|
||||
assigneeUserId: null,
|
||||
checkoutRunId: null,
|
||||
executionRunId: null,
|
||||
executionAgentNameKey: null,
|
||||
executionLockedAt: null,
|
||||
createdByAgentId: null,
|
||||
createdByUserId: null,
|
||||
issueNumber: null,
|
||||
identifier: "PAP-1",
|
||||
requestDepth: 0,
|
||||
billingCode: null,
|
||||
assigneeAdapterOverrides: null,
|
||||
executionWorkspaceId: null,
|
||||
executionWorkspacePreference: null,
|
||||
executionWorkspaceSettings: null,
|
||||
startedAt: null,
|
||||
completedAt: null,
|
||||
cancelledAt: null,
|
||||
hiddenAt: null,
|
||||
createdAt: new Date("2026-04-18T19:00:00.000Z"),
|
||||
updatedAt: new Date("2026-04-18T19:00:00.000Z"),
|
||||
...overrides,
|
||||
};
|
||||
}
|
||||
|
||||
function renderLedger(props: Partial<ComponentProps<typeof IssueRunLedgerContent>> = {}) {
|
||||
render(
|
||||
<IssueRunLedgerContent
|
||||
runs={props.runs ?? []}
|
||||
liveRuns={props.liveRuns}
|
||||
activeRun={props.activeRun}
|
||||
issueStatus={props.issueStatus ?? "in_progress"}
|
||||
childIssues={props.childIssues ?? []}
|
||||
agentMap={props.agentMap ?? new Map([["agent-1", { name: "CodexCoder" }]])}
|
||||
/>,
|
||||
);
|
||||
}
|
||||
|
||||
describe("IssueRunLedger", () => {
|
||||
it("renders every liveness state with exhausted continuation context", () => {
|
||||
const states: RunLivenessState[] = [
|
||||
"advanced",
|
||||
"plan_only",
|
||||
"empty_response",
|
||||
"blocked",
|
||||
"failed",
|
||||
"completed",
|
||||
"needs_followup",
|
||||
];
|
||||
|
||||
renderLedger({
|
||||
runs: states.map((state, index) =>
|
||||
createRun({
|
||||
runId: `run-${index}0000000`,
|
||||
createdAt: `2026-04-18T19:5${index}:00.000Z`,
|
||||
livenessState: state,
|
||||
livenessReason: state === "needs_followup"
|
||||
? "Run produced useful output but no concrete action evidence; continuation attempts exhausted"
|
||||
: `state ${state}`,
|
||||
continuationAttempt: state === "needs_followup" ? 3 : 0,
|
||||
}),
|
||||
),
|
||||
});
|
||||
|
||||
expect(container.textContent).toContain("Advanced");
|
||||
expect(container.textContent).toContain("Plan only");
|
||||
expect(container.textContent).toContain("Empty response");
|
||||
expect(container.textContent).toContain("Blocked");
|
||||
expect(container.textContent).toContain("Failed");
|
||||
expect(container.textContent).toContain("Completed");
|
||||
expect(container.textContent).toContain("Needs follow-up");
|
||||
expect(container.textContent).toContain("Exhausted");
|
||||
expect(container.textContent).toContain("Continuation attempt 3");
|
||||
});
|
||||
|
||||
it("renders historical runs without liveness metadata as unavailable", () => {
|
||||
renderLedger({
|
||||
runs: [
|
||||
createRun({
|
||||
livenessState: null,
|
||||
livenessReason: null,
|
||||
continuationAttempt: undefined,
|
||||
lastUsefulActionAt: null,
|
||||
nextAction: null,
|
||||
resultJson: null,
|
||||
}),
|
||||
],
|
||||
});
|
||||
|
||||
expect(container.textContent).toContain("No liveness data");
|
||||
expect(container.textContent).toContain("Stop Unavailable");
|
||||
expect(container.textContent).toContain("Last useful action Unavailable");
|
||||
});
|
||||
|
||||
it("shows live runs as pending final checks without missing-data language", () => {
|
||||
renderLedger({
|
||||
runs: [
|
||||
createRun({
|
||||
status: "running",
|
||||
finishedAt: null,
|
||||
livenessState: null,
|
||||
livenessReason: null,
|
||||
continuationAttempt: 0,
|
||||
lastUsefulActionAt: null,
|
||||
nextAction: null,
|
||||
resultJson: null,
|
||||
}),
|
||||
],
|
||||
});
|
||||
|
||||
expect(container.textContent).toContain("Running now by CodexCoder");
|
||||
expect(container.textContent).toContain("Checks after finish");
|
||||
expect(container.textContent).toContain("Last useful action No action recorded yet");
|
||||
expect(container.textContent).toContain("Stop Still running");
|
||||
expect(container.textContent).not.toContain("Liveness pending");
|
||||
expect(container.textContent).not.toContain("initial attempt");
|
||||
});
|
||||
|
||||
it("shows timeout, cancel, and budget stop reasons without raw logs", () => {
|
||||
renderLedger({
|
||||
runs: [
|
||||
createRun({
|
||||
runId: "run-timeout",
|
||||
resultJson: { stopReason: "timeout", timeoutFired: true, effectiveTimeoutSec: 30 },
|
||||
}),
|
||||
createRun({
|
||||
runId: "run-cancel",
|
||||
resultJson: { stopReason: "cancelled" },
|
||||
createdAt: "2026-04-18T19:57:00.000Z",
|
||||
}),
|
||||
createRun({
|
||||
runId: "run-budget",
|
||||
resultJson: { stopReason: "budget_paused" },
|
||||
createdAt: "2026-04-18T19:56:00.000Z",
|
||||
}),
|
||||
],
|
||||
});
|
||||
|
||||
expect(container.textContent).toContain("timeout (30s timeout)");
|
||||
expect(container.textContent).toContain("cancelled");
|
||||
expect(container.textContent).toContain("budget paused");
|
||||
});
|
||||
|
||||
it("surfaces active and completed child issue summaries", () => {
|
||||
renderLedger({
|
||||
childIssues: [
|
||||
createIssue({ id: "child-1", identifier: "PAP-2", title: "Implement worker handoff", status: "in_progress" }),
|
||||
createIssue({ id: "child-2", identifier: "PAP-3", title: "Verify final report", status: "done" }),
|
||||
createIssue({ id: "child-3", identifier: "PAP-4", title: "Cancelled experiment", status: "cancelled" }),
|
||||
],
|
||||
});
|
||||
|
||||
expect(container.textContent).toContain("Child work");
|
||||
expect(container.textContent).toContain("1 active, 1 done, 1 cancelled");
|
||||
expect(container.textContent).toContain("PAP-2");
|
||||
expect(container.textContent).toContain("Implement worker handoff");
|
||||
|
||||
renderLedger({
|
||||
childIssues: [
|
||||
createIssue({ id: "child-2", identifier: "PAP-3", title: "Verify final report", status: "done" }),
|
||||
createIssue({ id: "child-3", identifier: "PAP-4", title: "Cancelled experiment", status: "cancelled" }),
|
||||
],
|
||||
});
|
||||
|
||||
expect(container.textContent).toContain("all 2 terminal (1 done, 1 cancelled)");
|
||||
});
|
||||
|
||||
it("uses wrapping-friendly markup for long next action text", () => {
|
||||
renderLedger({
|
||||
runs: [
|
||||
createRun({
|
||||
nextAction: "Continue investigating this intentionally-long-next-action-token-that-needs-to-wrap-cleanly-on-mobile-and-desktop-without-overlapping-controls.",
|
||||
}),
|
||||
],
|
||||
});
|
||||
|
||||
const nextAction = [...container.querySelectorAll("span")]
|
||||
.find((node) => node.textContent?.includes("intentionally-long-next-action-token"));
|
||||
expect(nextAction?.className).toContain("break-words");
|
||||
expect(container.textContent).toContain("Next action:");
|
||||
});
|
||||
|
||||
it("shows when older runs are clipped from the ledger", () => {
|
||||
renderLedger({
|
||||
runs: Array.from({ length: 10 }, (_, index) =>
|
||||
createRun({
|
||||
runId: `run-${index.toString().padStart(8, "0")}`,
|
||||
createdAt: `2026-04-18T19:${String(index).padStart(2, "0")}:00.000Z`,
|
||||
}),
|
||||
),
|
||||
});
|
||||
|
||||
expect(container.textContent).toContain("2 older runs not shown");
|
||||
});
|
||||
});
|
||||
Loading…
Add table
Add a link
Reference in a new issue