paperclip/server/src/__tests__/pi-local-execute.test.ts

208 lines
7.3 KiB
TypeScript
Raw Normal View History

Treat Pi quota exhaustion as a failed run (#2305) ## Thinking Path Paperclip orchestrates AI agent runs and reports their success or failure. The Pi adapter spawns a local Pi process and interprets its JSONL output to determine the run outcome. When Pi hits a quota limit (429 RESOURCE_EXHAUSTED), it retries internally and emits an `auto_retry_end` event with `success: false` — but still exits with code 0. The current adapter trusts the exit code, so Paperclip marks the run as succeeded even though it produced no useful work. This PR teaches the parser to detect quota exhaustion and synthesize a failure. Closes #2234 ## Changes - Parse `auto_retry_end` events with `success: false` into `result.errors` - Parse standalone `error` events into `result.errors` - Synthesize exit code 1 when Pi exits 0 but parsed errors exist - Use the parsed error as `errorMessage` so the failure reason is visible in the UI ## Verification ```bash pnpm vitest run pi-local-execute pnpm vitest run --reporter=verbose 2>&1 | grep pi-local ``` - `parse.test.ts`: covers failed retry, successful retry (no error), standalone error events, and empty error messages - `pi-local-execute.test.ts`: end-to-end test with a fake Pi binary that emits `auto_retry_end` + exits 0, asserts the run is marked failed ## Risks - **Low**: Only affects runs where Pi exits 0 with a parsed error — no change to normal successful or already-failing runs - If Pi emits `auto_retry_end { success: false }` but the run actually produced valid output, this would incorrectly mark it as failed. This seems unlikely given the semantics of the event. ## Model Used - Claude Opus 4.6 (Anthropic) — assisted with test additions and PR template ## Checklist - [x] Thinking path documented - [x] Model specified - [x] Tests pass locally - [x] Test coverage for new parse branches (success path, error events, empty messages) - [x] No UI changes - [x] Risk analysis included --------- Co-authored-by: Dawid Piaskowski <dawid@MacBook-Pro.local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 23:29:41 +02:00
import { describe, expect, it } from "vitest";
import fs from "node:fs/promises";
import os from "node:os";
import path from "node:path";
import { execute } from "@paperclipai/adapter-pi-local/server";
async function writeFakePiCommand(commandPath: string): Promise<void> {
const script = `#!/usr/bin/env node
if (process.argv.includes("--list-models")) {
console.log("provider model");
console.log("google gemini-3-flash-preview");
process.exit(0);
}
console.log(JSON.stringify({ type: "agent_start" }));
console.log(JSON.stringify({ type: "turn_start" }));
console.log(JSON.stringify({ type: "turn_end", message: { role: "assistant", content: "" }, toolResults: [] }));
console.log(JSON.stringify({ type: "agent_end", messages: [] }));
console.log(JSON.stringify({
type: "auto_retry_end",
success: false,
attempt: 3,
finalError: "Cloud Code Assist API error (429): RESOURCE_EXHAUSTED"
}));
process.exit(0);
`;
await fs.writeFile(commandPath, script, "utf8");
await fs.chmod(commandPath, 0o755);
}
fix(pi-local): prepend installed skill bin/ dirs to child PATH (#4331) ## Thinking Path > - Paperclip orchestrates AI agents; each agent runs under an adapter that spawns a model CLI as a child process. > - The pi-local adapter (`packages/adapters/pi-local`) spawns `pi` and inherits the child's shell environment — including `PATH`, which determines what the child's bash tool can execute by name. > - Paperclip skills ship executable helpers under `<skill>/bin/` (e.g. `paperclip-get-issue`) and Reviewer/QA-style `AGENTS.md` files invoke them by name via the agent's bash tool. > - Pi-local builds its runtime env with `ensurePathInEnv({ ...process.env, ...env })` only — it never adds the installed skills' `bin/` dirs to PATH. The pi CLI's `--skill` arg loads each skill's SKILL.md but does not augment PATH. > - Consequence: every bash invocation of a skill helper fails with `exit 127: command not found`. The agent then spends its heartbeat guessing (re-reading SKILL.md, trying `find`, inventing command paths) and either times out or gives up. > - This PR prepends each injected skill's `bin/` directory to the child PATH immediately before runtimeEnv is constructed. > - The benefit: pi_local agents whose AGENTS.md uses any `paperclip-*` skill helper can actually run those helpers. ## What Changed - `packages/adapters/pi-local/src/server/execute.ts`: compute `skillBinDirs` from the already-resolved `piSkillEntries`, dedupe against the existing PATH, prepend them to whichever of `PATH` / `Path` the merged env uses, then build `runtimeEnv`. No new helpers, no adapter-utils changes. ## Verification Manual repro before the fix: 1. Create a pi_local agent wired to a paperclip skill (e.g. paperclip-control). 2. Wake the agent on an in_review issue with an AGENTS.md that starts with `paperclip-get-issue "$PAPERCLIP_TASK_ID"`. 3. Session file: `{ "role": "toolResult", "isError": true, "content": [{ "text": "/bin/bash: paperclip-get-issue: command not found\n\nCommand exited with code 127" }] }`. After the fix: same wake; `paperclip-get-issue` resolves and returns the issue JSON; agent proceeds. Local commands: ``` pnpm --filter @paperclipai/adapter-pi-local typecheck # clean pnpm --filter @paperclipai/adapter-pi-local build # clean pnpm --filter @paperclipai/server exec vitest run \ src/__tests__/pi-local-execute.test.ts \ src/__tests__/pi-local-adapter-environment.test.ts \ src/__tests__/pi-local-skill-sync.test.ts # 5/5 passing ``` No new tests: the existing `pi-local-skill-sync.test.ts` covers skill symlink injection (upstream of the PATH step), and `pi-local-execute.test.ts` covers the spawn path; this change only augments env on the same spawn path. ## Risks Low. Pure PATH augmentation on the child env. Edge cases: - Zero skills installed → no PATH change (guarded by `skillBinDirs.length > 0`). - Duplicate bin dirs already on PATH → deduped; no pollution on re-runs. - Windows `Path` casing → falls back correctly when merged env uses `Path` instead of `PATH`. - Skill dir without `bin/` subdir → joined path simply won't resolve; harmless. No behavioral change for pi_local agents that don't use skill-provided commands. ## Model Used - Claude, `claude-opus-4-7` (1M context), extended thinking enabled, tool use enabled. Walked pi-local/cursor-local/claude-local and adapter-utils to isolate the gap, wrote the inlined fix, and ran typecheck/build/test locally. ## Checklist - [x] Thinking path from project context to this change - [x] Model used specified - [x] Checked ROADMAP.md — no overlap - [x] Tests run locally, passing - [x] Tests added — new case in `server/src/__tests__/pi-local-execute.test.ts`; verified it fails when the fix is reverted - [ ] UI screenshots — N/A (backend adapter change) - [x] Docs updated — N/A (internal adapter, no user-facing docs) - [x] Risks documented - [x] Will address reviewer comments before merge
2026-04-23 11:15:10 -04:00
async function writeEnvDumpPiCommand(commandPath: string, envDumpPath: string): Promise<void> {
const script = `#!/usr/bin/env node
const fs = require("node:fs");
if (process.argv.includes("--list-models")) {
console.log("provider model");
console.log("google gemini-3-flash-preview");
process.exit(0);
}
fs.writeFileSync(${JSON.stringify(envDumpPath)}, process.env.PATH || "");
console.log(JSON.stringify({ type: "agent_start" }));
console.log(JSON.stringify({ type: "turn_start" }));
console.log(JSON.stringify({ type: "turn_end", message: { role: "assistant", content: "" }, toolResults: [] }));
console.log(JSON.stringify({ type: "agent_end", messages: [] }));
process.exit(0);
`;
await fs.writeFile(commandPath, script, "utf8");
await fs.chmod(commandPath, 0o755);
}
Treat Pi quota exhaustion as a failed run (#2305) ## Thinking Path Paperclip orchestrates AI agent runs and reports their success or failure. The Pi adapter spawns a local Pi process and interprets its JSONL output to determine the run outcome. When Pi hits a quota limit (429 RESOURCE_EXHAUSTED), it retries internally and emits an `auto_retry_end` event with `success: false` — but still exits with code 0. The current adapter trusts the exit code, so Paperclip marks the run as succeeded even though it produced no useful work. This PR teaches the parser to detect quota exhaustion and synthesize a failure. Closes #2234 ## Changes - Parse `auto_retry_end` events with `success: false` into `result.errors` - Parse standalone `error` events into `result.errors` - Synthesize exit code 1 when Pi exits 0 but parsed errors exist - Use the parsed error as `errorMessage` so the failure reason is visible in the UI ## Verification ```bash pnpm vitest run pi-local-execute pnpm vitest run --reporter=verbose 2>&1 | grep pi-local ``` - `parse.test.ts`: covers failed retry, successful retry (no error), standalone error events, and empty error messages - `pi-local-execute.test.ts`: end-to-end test with a fake Pi binary that emits `auto_retry_end` + exits 0, asserts the run is marked failed ## Risks - **Low**: Only affects runs where Pi exits 0 with a parsed error — no change to normal successful or already-failing runs - If Pi emits `auto_retry_end { success: false }` but the run actually produced valid output, this would incorrectly mark it as failed. This seems unlikely given the semantics of the event. ## Model Used - Claude Opus 4.6 (Anthropic) — assisted with test additions and PR template ## Checklist - [x] Thinking path documented - [x] Model specified - [x] Tests pass locally - [x] Test coverage for new parse branches (success path, error events, empty messages) - [x] No UI changes - [x] Risk analysis included --------- Co-authored-by: Dawid Piaskowski <dawid@MacBook-Pro.local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 23:29:41 +02:00
describe("pi_local execute", () => {
it("fails the run when Pi exhausts automatic retries despite exiting 0", async () => {
const root = await fs.mkdtemp(path.join(os.tmpdir(), "paperclip-pi-execute-"));
const workspace = path.join(root, "workspace");
const commandPath = path.join(root, "pi");
await fs.mkdir(workspace, { recursive: true });
await writeFakePiCommand(commandPath);
const previousHome = process.env.HOME;
process.env.HOME = root;
try {
const result = await execute({
runId: "run-pi-quota-exhausted",
agent: {
id: "agent-1",
companyId: "company-1",
name: "Pi Agent",
adapterType: "pi_local",
adapterConfig: {},
},
runtime: {
sessionId: null,
sessionParams: null,
sessionDisplayId: null,
taskKey: null,
},
config: {
command: commandPath,
cwd: workspace,
model: "google/gemini-3-flash-preview",
promptTemplate: "Keep working.",
},
context: {},
authToken: "run-jwt-token",
onLog: async () => {},
});
expect(result.exitCode).toBe(1);
expect(result.errorMessage).toContain("RESOURCE_EXHAUSTED");
} finally {
if (previousHome === undefined) delete process.env.HOME;
else process.env.HOME = previousHome;
await fs.rm(root, { recursive: true, force: true });
}
});
fix(pi-local): prepend installed skill bin/ dirs to child PATH (#4331) ## Thinking Path > - Paperclip orchestrates AI agents; each agent runs under an adapter that spawns a model CLI as a child process. > - The pi-local adapter (`packages/adapters/pi-local`) spawns `pi` and inherits the child's shell environment — including `PATH`, which determines what the child's bash tool can execute by name. > - Paperclip skills ship executable helpers under `<skill>/bin/` (e.g. `paperclip-get-issue`) and Reviewer/QA-style `AGENTS.md` files invoke them by name via the agent's bash tool. > - Pi-local builds its runtime env with `ensurePathInEnv({ ...process.env, ...env })` only — it never adds the installed skills' `bin/` dirs to PATH. The pi CLI's `--skill` arg loads each skill's SKILL.md but does not augment PATH. > - Consequence: every bash invocation of a skill helper fails with `exit 127: command not found`. The agent then spends its heartbeat guessing (re-reading SKILL.md, trying `find`, inventing command paths) and either times out or gives up. > - This PR prepends each injected skill's `bin/` directory to the child PATH immediately before runtimeEnv is constructed. > - The benefit: pi_local agents whose AGENTS.md uses any `paperclip-*` skill helper can actually run those helpers. ## What Changed - `packages/adapters/pi-local/src/server/execute.ts`: compute `skillBinDirs` from the already-resolved `piSkillEntries`, dedupe against the existing PATH, prepend them to whichever of `PATH` / `Path` the merged env uses, then build `runtimeEnv`. No new helpers, no adapter-utils changes. ## Verification Manual repro before the fix: 1. Create a pi_local agent wired to a paperclip skill (e.g. paperclip-control). 2. Wake the agent on an in_review issue with an AGENTS.md that starts with `paperclip-get-issue "$PAPERCLIP_TASK_ID"`. 3. Session file: `{ "role": "toolResult", "isError": true, "content": [{ "text": "/bin/bash: paperclip-get-issue: command not found\n\nCommand exited with code 127" }] }`. After the fix: same wake; `paperclip-get-issue` resolves and returns the issue JSON; agent proceeds. Local commands: ``` pnpm --filter @paperclipai/adapter-pi-local typecheck # clean pnpm --filter @paperclipai/adapter-pi-local build # clean pnpm --filter @paperclipai/server exec vitest run \ src/__tests__/pi-local-execute.test.ts \ src/__tests__/pi-local-adapter-environment.test.ts \ src/__tests__/pi-local-skill-sync.test.ts # 5/5 passing ``` No new tests: the existing `pi-local-skill-sync.test.ts` covers skill symlink injection (upstream of the PATH step), and `pi-local-execute.test.ts` covers the spawn path; this change only augments env on the same spawn path. ## Risks Low. Pure PATH augmentation on the child env. Edge cases: - Zero skills installed → no PATH change (guarded by `skillBinDirs.length > 0`). - Duplicate bin dirs already on PATH → deduped; no pollution on re-runs. - Windows `Path` casing → falls back correctly when merged env uses `Path` instead of `PATH`. - Skill dir without `bin/` subdir → joined path simply won't resolve; harmless. No behavioral change for pi_local agents that don't use skill-provided commands. ## Model Used - Claude, `claude-opus-4-7` (1M context), extended thinking enabled, tool use enabled. Walked pi-local/cursor-local/claude-local and adapter-utils to isolate the gap, wrote the inlined fix, and ran typecheck/build/test locally. ## Checklist - [x] Thinking path from project context to this change - [x] Model used specified - [x] Checked ROADMAP.md — no overlap - [x] Tests run locally, passing - [x] Tests added — new case in `server/src/__tests__/pi-local-execute.test.ts`; verified it fails when the fix is reverted - [ ] UI screenshots — N/A (backend adapter change) - [x] Docs updated — N/A (internal adapter, no user-facing docs) - [x] Risks documented - [x] Will address reviewer comments before merge
2026-04-23 11:15:10 -04:00
it("prepends installed skill bin/ dirs to the spawned Pi child PATH", async () => {
const root = await fs.mkdtemp(path.join(os.tmpdir(), "paperclip-pi-path-"));
const workspace = path.join(root, "workspace");
const commandPath = path.join(root, "pi");
const skillDir = path.join(root, "skills", "demo-skill");
const skillBinDir = path.join(skillDir, "bin");
const envDumpPath = path.join(root, "captured-path.txt");
await fs.mkdir(workspace, { recursive: true });
await fs.mkdir(skillBinDir, { recursive: true });
await fs.writeFile(path.join(skillDir, "SKILL.md"), "# demo-skill\n", "utf8");
await writeEnvDumpPiCommand(commandPath, envDumpPath);
const previousHome = process.env.HOME;
process.env.HOME = root;
try {
await execute({
runId: "run-pi-skill-path",
agent: {
id: "agent-skill-path",
companyId: "company-skill-path",
name: "Pi Agent",
adapterType: "pi_local",
adapterConfig: {},
},
runtime: {
sessionId: null,
sessionParams: null,
sessionDisplayId: null,
taskKey: null,
},
config: {
command: commandPath,
cwd: workspace,
model: "google/gemini-3-flash-preview",
promptTemplate: "Keep working.",
paperclipRuntimeSkills: [
{ key: "demo-skill", runtimeName: "demo-skill", source: skillDir, required: true },
],
},
context: {},
authToken: "run-jwt-token",
onLog: async () => {},
});
const capturedPath = await fs.readFile(envDumpPath, "utf8");
const entries = capturedPath.split(path.delimiter);
expect(entries[0]).toBe(skillBinDir);
expect(entries.filter((entry) => entry === skillBinDir)).toHaveLength(1);
} finally {
if (previousHome === undefined) delete process.env.HOME;
else process.env.HOME = previousHome;
await fs.rm(root, { recursive: true, force: true });
}
});
it("does not expose bin/ dirs from skills that are not injected", async () => {
const root = await fs.mkdtemp(path.join(os.tmpdir(), "paperclip-pi-path-neg-"));
const workspace = path.join(root, "workspace");
const commandPath = path.join(root, "pi");
const nonInjectedSkillDir = path.join(root, "skills", "not-injected");
const nonInjectedBinDir = path.join(nonInjectedSkillDir, "bin");
const envDumpPath = path.join(root, "captured-path.txt");
await fs.mkdir(workspace, { recursive: true });
await fs.mkdir(nonInjectedBinDir, { recursive: true });
await fs.writeFile(path.join(nonInjectedSkillDir, "SKILL.md"), "# not-injected\n", "utf8");
await writeEnvDumpPiCommand(commandPath, envDumpPath);
const previousHome = process.env.HOME;
process.env.HOME = root;
try {
await execute({
runId: "run-pi-skill-path-neg",
agent: {
id: "agent-skill-path-neg",
companyId: "company-skill-path-neg",
name: "Pi Agent",
adapterType: "pi_local",
adapterConfig: {},
},
runtime: {
sessionId: null,
sessionParams: null,
sessionDisplayId: null,
taskKey: null,
},
config: {
command: commandPath,
cwd: workspace,
model: "google/gemini-3-flash-preview",
promptTemplate: "Keep working.",
// required:false with no explicit paperclipSkillSync preference →
// resolvePaperclipDesiredSkillNames returns [] → skill is not injected.
paperclipRuntimeSkills: [
{ key: "not-injected", runtimeName: "not-injected", source: nonInjectedSkillDir, required: false },
],
},
context: {},
authToken: "run-jwt-token",
onLog: async () => {},
});
const capturedPath = await fs.readFile(envDumpPath, "utf8");
expect(capturedPath.split(path.delimiter)).not.toContain(nonInjectedBinDir);
} finally {
if (previousHome === undefined) delete process.env.HOME;
else process.env.HOME = previousHome;
await fs.rm(root, { recursive: true, force: true });
}
});
Treat Pi quota exhaustion as a failed run (#2305) ## Thinking Path Paperclip orchestrates AI agent runs and reports their success or failure. The Pi adapter spawns a local Pi process and interprets its JSONL output to determine the run outcome. When Pi hits a quota limit (429 RESOURCE_EXHAUSTED), it retries internally and emits an `auto_retry_end` event with `success: false` — but still exits with code 0. The current adapter trusts the exit code, so Paperclip marks the run as succeeded even though it produced no useful work. This PR teaches the parser to detect quota exhaustion and synthesize a failure. Closes #2234 ## Changes - Parse `auto_retry_end` events with `success: false` into `result.errors` - Parse standalone `error` events into `result.errors` - Synthesize exit code 1 when Pi exits 0 but parsed errors exist - Use the parsed error as `errorMessage` so the failure reason is visible in the UI ## Verification ```bash pnpm vitest run pi-local-execute pnpm vitest run --reporter=verbose 2>&1 | grep pi-local ``` - `parse.test.ts`: covers failed retry, successful retry (no error), standalone error events, and empty error messages - `pi-local-execute.test.ts`: end-to-end test with a fake Pi binary that emits `auto_retry_end` + exits 0, asserts the run is marked failed ## Risks - **Low**: Only affects runs where Pi exits 0 with a parsed error — no change to normal successful or already-failing runs - If Pi emits `auto_retry_end { success: false }` but the run actually produced valid output, this would incorrectly mark it as failed. This seems unlikely given the semantics of the event. ## Model Used - Claude Opus 4.6 (Anthropic) — assisted with test additions and PR template ## Checklist - [x] Thinking path documented - [x] Model specified - [x] Tests pass locally - [x] Test coverage for new parse branches (success path, error events, empty messages) - [x] No UI changes - [x] Risk analysis included --------- Co-authored-by: Dawid Piaskowski <dawid@MacBook-Pro.local> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 23:29:41 +02:00
});