Harden Cloudflare sandbox execution (#5967)

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - Remote-managed adapters need sandbox/environment execution to behave
like real agent runs, not just local host probes.
> - The Cloudflare sandbox path was the weakest leg in the SSH +
Cloudflare QA matrix because bridge execution could truncate output,
time out long-running installs, and under-provision the worker instance.
> - That made several adapters fail for reasons unrelated to their
actual business logic, which blocks confidence in Paperclip's non-local
environment model.
> - This pull request hardens the Cloudflare bridge/runtime path and
adjusts sandbox probe budgets so adapter verification matches the
measured behavior of the fixed environment.
> - It also corrects the Pi sandbox install command so the QA matrix
exercises a real, supported install path.
> - The benefit is a materially more reliable SSH + Cloudflare adapter
matrix with fewer false negatives and clearer failure boundaries.

## What Changed

- Switched the Cloudflare bridge worker instance type to `standard-2`
for the QA-matrix execution path.
- Raised Cloudflare bridge/plugin-worker timeout budgets and added SSE
keepalives so long-running install/exec calls can complete instead of
dying at the transport layer.
- Fixed Cloudflare bridge-channel command handling to avoid dropped
final stdout chunks on short-lived execs.
- Made Claude, OpenCode, and Cursor sandbox probe timeouts
configurable/sandbox-aware, then tightened the defaults to the measured
post-fix range.
- Updated the Pi sandbox install command to use the package currently
installed by the official `pi.dev` installer, pinned to a specific npm
version.
- Added/updated tests around Cloudflare bridge behavior and adapter
sandbox probe paths.

## Verification

- `pnpm --filter @paperclipai/adapter-claude-local typecheck`
- `pnpm --filter @paperclipai/adapter-opencode-local typecheck`
- `pnpm --filter @paperclipai/adapter-cursor-local typecheck`
- `pnpm vitest run packages/adapters/cursor-local
packages/adapters/claude-local packages/adapters/opencode-local
packages/adapters/pi-local packages/plugins/sandbox-providers/cloudflare
server/src/services/__tests__/plugin-worker-manager.test.ts`
- Manual QA on the dedicated dev instance using the SSH + Cloudflare
environment matrix (`ENV-29` through `ENV-40`). Clean end-to-end passes:
SSH `claude_local`, `codex_local`, `cursor`, `gemini_local`; Cloudflare
`claude_local`, `codex_local`, `cursor`, `gemini_local`.

## Risks

- Cloudflare sandbox cost increases because the bridge worker now runs
on `standard-2` instead of `lite`.
- Higher timeout ceilings can delay surfacing truly hung Cloudflare
bridge calls, even though they remove transport-level false negatives.
- The manual heartbeat matrix still exposed follow-on
execution/sync/disposition bugs in `opencode_local` and `pi_local`;
those are not fixed by this PR.

## Model Used

- OpenAI `gpt-5.4` via Paperclip `codex_local`, reasoning effort `high`,
tool use enabled, repo search enabled.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots (not applicable)
- [x] I have updated relevant documentation to reflect my changes (not
applicable)
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
This commit is contained in:
Devin Foley 2026-05-13 22:00:10 -07:00 committed by GitHub
parent f4bed4a70f
commit 1bd44c8a0d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
10 changed files with 113 additions and 12 deletions

View file

@ -212,6 +212,14 @@ export async function testEnvironment(
if (maxTurns > 0) args.push("--max-turns", String(maxTurns));
if (extraArgs.length > 0) args.push(...extraArgs);
// Sandbox bridges still add lease warmup and transport overhead, but
// the standard-2 Cloudflare tier now probes fast enough that a 90s
// budget leaves headroom without masking real hangs.
const helloProbeTimeoutSec = Math.max(
1,
asNumber(config.helloProbeTimeoutSec, targetIsSandbox ? 90 : 45),
);
const probe = await runAdapterExecutionTargetProcess(
runId,
target,
@ -220,7 +228,7 @@ export async function testEnvironment(
{
cwd,
env,
timeoutSec: 45,
timeoutSec: helloProbeTimeoutSec,
graceSec: 5,
stdin: "Respond with hello.",
onLog: async () => {},

View file

@ -4,6 +4,7 @@ import type {
AdapterEnvironmentTestResult,
} from "@paperclipai/adapter-utils";
import {
asNumber,
asString,
asStringArray,
parseObject,
@ -98,6 +99,7 @@ export async function testEnvironment(
let command = asString(config.command, "agent");
const target = ctx.executionTarget ?? null;
const targetIsRemote = target?.kind === "remote";
const targetIsSandbox = target?.kind === "remote" && target.transport === "sandbox";
const cwd = resolveAdapterExecutionTargetCwd(target, asString(config.cwd, ""), process.cwd());
const targetLabel = targetIsRemote
? ctx.environmentName ?? describeAdapterExecutionTarget(target)
@ -230,6 +232,12 @@ export async function testEnvironment(
hint: "Use `agent` or `cursor-agent` to run the automatic installation and auth probe.",
});
} else {
// Cursor's `agent` binary still pays cold-start overhead in container
// sandboxes, but standard-2 probes no longer need a 120s version budget.
const versionProbeTimeoutSec = Math.max(
1,
asNumber(config.versionProbeTimeoutSec, targetIsSandbox ? 60 : 45),
);
const versionProbe = await runAdapterExecutionTargetProcess(
runId,
target,
@ -238,7 +246,7 @@ export async function testEnvironment(
{
cwd,
env,
timeoutSec: 45,
timeoutSec: versionProbeTimeoutSec,
graceSec: 5,
onLog: async () => {},
},
@ -295,6 +303,12 @@ export async function testEnvironment(
if (extraArgs.length > 0) args.push(...extraArgs);
args.push("Respond with hello.");
// Sandbox bridges still add cursor CLI cold-start overhead, but the
// standard-2 tier now completes probes fast enough that 90s is ample.
const helloProbeTimeoutSec = Math.max(
1,
asNumber(config.helloProbeTimeoutSec, targetIsSandbox ? 90 : 45),
);
const probe = await runAdapterExecutionTargetProcess(
runId,
target,
@ -303,7 +317,7 @@ export async function testEnvironment(
{
cwd,
env,
timeoutSec: 45,
timeoutSec: helloProbeTimeoutSec,
graceSec: 5,
onLog: async () => {},
},

View file

@ -9,6 +9,7 @@ import type {
import type { AdapterExecutionTarget } from "@paperclipai/adapter-utils/execution-target";
import {
asBoolean,
asNumber,
asString,
asStringArray,
parseObject,
@ -72,6 +73,7 @@ export async function testEnvironment(
const command = asString(config.command, "opencode");
const target = ctx.executionTarget ?? null;
const targetIsRemote = target?.kind === "remote";
const targetIsSandbox = target?.kind === "remote" && target.transport === "sandbox";
const cwd = resolveAdapterExecutionTargetCwd(target, asString(config.cwd, ""), process.cwd());
const targetLabel = targetIsRemote
? ctx.environmentName ?? describeAdapterExecutionTarget(target)
@ -334,6 +336,14 @@ export async function testEnvironment(
if (variant) args.push("--variant", variant);
if (extraArgs.length > 0) args.push(...extraArgs);
// Sandbox bridges still add cold-start and transport overhead, but the
// standard-2 Cloudflare tier now probes quickly enough that 90s keeps
// useful headroom without letting slow hangs linger.
const helloProbeTimeoutSec = Math.max(
1,
asNumber(config.helloProbeTimeoutSec, targetIsSandbox ? 90 : 60),
);
try {
const probe = await runAdapterExecutionTargetProcess(
runId,
@ -343,7 +353,7 @@ export async function testEnvironment(
{
cwd: runtimeCwd,
env: runtimeEnv,
timeoutSec: 60,
timeoutSec: helloProbeTimeoutSec,
graceSec: 5,
stdin: "Respond with hello.",
onLog: async () => {},

View file

@ -3,7 +3,7 @@ import type { AdapterModelProfileDefinition } from "@paperclipai/adapter-utils";
export const type = "pi_local";
export const label = "Pi (local)";
export const SANDBOX_INSTALL_COMMAND = "npm install -g @mariozechner/pi-coding-agent";
export const SANDBOX_INSTALL_COMMAND = "npm install -g @earendil-works/pi-coding-agent@0.74.0";
export const models: Array<{ id: string; label: string }> = [];