Commit graph

5 commits

Author SHA1 Message Date
Devin Foley
fe3904f434
Stabilize runtime probes and Codex env tests (#5445)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Adapters expose a Test action that probes the configured runtime —
install, resolvability, hello — to give operators a fast yes/no on
whether an environment is healthy
> - The Codex test path was running its hello probe directly without
going through the managed-runtime preparation that production runs use,
so a healthy production setup could still report a probe failure
> - The plugin worker manager wasn't surfacing terminated workers
cleanly, leaving the runtime probe waiting on a dead worker until the
request timed out
> - This pull request routes the Codex test probe through
`prepareAdapterExecutionTargetRuntime` (so it sees the same managed
Codex home production sees), exposes `commandCwd` on
`createCommandManagedRuntimeClient` so callers can target a per-probe
directory without leaking the workspace `remoteCwd`, and propagates
plugin-worker termination as a usable error instead of a hang
> - The benefit is the Codex Test action mirrors production behavior
end-to-end, and probes against a terminated plugin worker fail fast
instead of timing out

## What Changed

- `packages/adapter-utils/src/command-managed-runtime.ts`: rename the
`remoteCwd` knob to `commandCwd` so callers can target a per-probe
directory without inheriting the workspace cwd; matching test coverage
in `command-managed-runtime.test.ts`
- `packages/adapter-utils/src/sandbox-callback-bridge.{ts,test.ts}`:
small fixes to keep callback bridge stop semantics deterministic
- `packages/adapters/codex-local/src/server/test.ts`: thread the Codex
hello probe through `prepareAdapterExecutionTargetRuntime` +
`prepareManagedCodexHome` so the probe sees the same managed home
production sees; new `test.remote.test.ts` covers the remote probe path
- `packages/adapters/cursor-local/src/server/execute.ts`: small
probe-side cleanup that aligns with the new commandCwd contract
- `server/src/services/plugin-worker-manager.ts`: surface plugin-worker
termination as a structured error so callers fail fast; new
`plugin-worker-terminated.cjs` fixture and
`plugin-worker-manager.test.ts` cases pin the behavior

## Verification

- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils
--project @paperclipai/adapter-codex-local --project
@paperclipai/adapter-cursor-local --project @paperclipai/server` —
1749/1750 passing (1 unrelated skip)
- `pnpm typecheck` clean

## Risks

Low–medium. The `remoteCwd → commandCwd` rename is a parameter renaming
on an internal helper used only by adapter test/execute paths in this
repo. The plugin-worker-terminated path was previously a hang; failing
fast may surface latent timeouts as explicit termination errors in
callers that already expected them.

## Model Used

Claude Opus 4.7 (1M context)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — new tests cover
commandCwd, plugin-worker termination, and Codex remote test path
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---

> **Stacked PR.** Sits on top of #5444 which adds the per-run runtime
API surface this PR builds on. Cumulative diff against `master` includes
that PR's content; the files touched by *this* PR's commit are listed
under "What Changed" above. Will rebase onto `master` and force-push
once #5444 merges.
2026-05-07 14:52:31 -07:00
Devin Foley
50db8c01d2
Serialize sandbox callback bridge against concurrent heartbeats (#5326)
> **Stacked PR.** This PR's branch carries cumulative content from #5324
(bridge allowlist expand) and #5325 (env sanitization) — the
mutex/sha256 logic in this PR sits on top of both. Reviewers should
focus on the files this PR's commit touches:
`packages/adapter-utils/src/sandbox-callback-bridge.{ts,test.ts}`,
`packages/adapter-utils/src/ssh.ts`, and
`packages/adapter-utils/src/ssh-fixture.test.ts`. Will rebase onto
`master` and force-push once both prerequisite PRs are merged.

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Each agent that runs in a sandbox or via SSH talks back to the
Paperclip server through a per-lease callback bridge whose entrypoint
script is uploaded to the remote
> - When two heartbeats target the same agent on the same machine
concurrently, both upload the bridge entrypoint and both write to the
same response files — producing torn-write races: `SyntaxError:
Identifier 'randomUUID' has already been declared` from a concatenated
upload, `mv: cannot stat …` from colliding `.json.tmp` writes, and
0-byte commits from a truncated stdin
> - This pull request serializes those operations with a POSIX
`mkdir`-mutex (PID liveness check + atomic rename) at the bridge
entrypoint upload, applies the same lock to the bridge response writer,
forwards stdin into remote ssh commands so the entrypoint payload
arrives intact, and verifies a sha256 of the upload before promoting it
> - The benefit is concurrent heartbeats no longer corrupt each other's
bridge state

## What Changed

- `packages/adapter-utils/src/sandbox-callback-bridge.ts`: serialize
entrypoint upload and response writes via POSIX `mkdir`-mutex with PID
liveness; sha256 the upload before promoting via `mv`; content-skip when
the existing entrypoint already matches
- `packages/adapter-utils/src/ssh.ts`: forward stdin into remote ssh
commands through the SSH managed runtime so `cat > "$remote_upload"`
actually receives the base64-encoded entrypoint
- `packages/adapter-utils/src/ssh-fixture.test.ts`: cover the
stdin-forwarded SSH path
- `packages/adapter-utils/src/sandbox-callback-bridge.test.ts`: cover
the mutex, content-skip, sha256-verify, and atomic-rename paths

## Verification

- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils`
- `pnpm typecheck` clean
- Manual: two parallel heartbeats targeting the same SSH agent no longer
race on the bridge entrypoint or response files

## Risks

Medium. Serializing previously-parallel operations adds latency on the
contended path (one heartbeat waits on another), bounded by the
entrypoint upload time. The mutex includes PID liveness so a crashed
heartbeat doesn't deadlock subsequent ones. Sha256-verify gives a clear
"torn upload" failure mode instead of silent 0-byte commits.

## Model Used

Claude Opus 4.7 (1M context)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — tests cover mutex
+ sha256-verify + stdin-forwarded ssh
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-05 20:01:04 -07:00
Devin Foley
36eaf9778f
Expand sandbox callback bridge allowlist to cover the documented heartbeat surface (#5324)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - When an agent runs in an e2b sandbox or other non-managed
environment, it talks back to the Paperclip server through a per-lease
callback bridge that proxies HTTP requests
> - The bridge has an allowlist of method/path patterns it will forward;
anything outside the list is rejected to keep the bridge tight
> - The allowlist had drifted behind what the heartbeat documentation
describes as the supported callback surface — several documented
endpoints (issue updates, agent-side log emit, work-status writes) were
being rejected at the bridge
> - This pull request expands the allowlist to cover the documented
heartbeat surface and adds tests that pin every newly-allowed pattern,
so the doc and the bridge stay in sync
> - The benefit is sandboxed runs no longer hit "method not allowed" /
"path not allowed" rejections on the documented set of callbacks

## What Changed

- `packages/adapter-utils/src/sandbox-callback-bridge.ts`: expand the
method/path allowlist to match the documented heartbeat callback surface
- `packages/adapter-utils/src/sandbox-callback-bridge.test.ts`: add
coverage for every newly-allowed pattern, plus negative cases for
patterns that should still be rejected

## Verification

- `pnpm vitest run --no-coverage --project @paperclipai/adapter-utils`
- `pnpm typecheck` clean
- Manual: previously-rejected callbacks from sandboxed runs now succeed
end-to-end

## Risks

Low. The allowlist only grows; nothing previously allowed is now
blocked. Tests pin both the new allowed patterns and that out-of-doc
patterns stay rejected.

## Model Used

Claude Opus 4.7 (1M context)

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable — new tests cover
added patterns + still-rejected negatives
- [x] If this change affects the UI, I have included before/after
screenshots — N/A (no UI)
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-05 19:30:11 -07:00
Devin Foley
a7b45938b7
Let sandbox providers declare shell defaults (#5114)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Agents execute in sandboxed remote environments served by pluggable
sandbox
>   providers (E2B today, more later)
> - Today every sandbox command runs under `sh -lc` regardless of what
the
>   provider's container actually ships
> - That misses bash-only shell init on E2B (which ships bash) and
prevents
> future providers from declaring a different default — there's no way
for a
>   provider to say "I have bash, use it"
> - This PR adds a `shellCommand` field to sandbox execution targets so
providers
> can declare their preferred shell ("bash" for E2B), threads it through
the
> sandbox-managed-runtime client, callback bridge, and execution-target
shell
>   helper, and validates the value at the lease-metadata boundary
> - The benefit is that sandbox commands run under the right shell on
the right
> provider, and adding new sandbox providers only needs to declare a
shell
>   preference

## What Changed

- Added `packages/adapter-utils/src/sandbox-shell.ts` exporting
`preferredShellForSandbox(shellCommand)` (returns `"bash"` if input is
`"bash"`,
  else `"sh"`)
- Added `shellCommand?: "bash" | "sh" | null` to
`AdapterSandboxExecutionTarget`
  and `CommandManagedRuntimeSpec`; threaded it through
`runAdapterExecutionTargetShellCommand`,
`prepareAdapterExecutionTargetRuntime`,
  and `startAdapterExecutionTargetPaperclipBridge`
- `createCommandManagedRuntimeClient`, `prepareCommandManagedRuntime`,
and
`createCommandManagedSandboxCallbackBridgeQueueClient` now take an
optional
  `shellCommand` and use `preferredShellForSandbox` to pick the shell
- `startSandboxCallbackBridgeServer` accepts a `shellCommand` for its
server
  startup, readiness probe, and stop hook
- E2B sandbox plugin declares `shellCommand: "bash"` in `leaseMetadata`
- `resolveEnvironmentExecutionTarget` reads `shellCommand` from lease
metadata
  (validating against `"bash" | "sh" | null`)
- `environment-runtime.ts` adds `"shellCommand"` to
`INTERNAL_PLUGIN_SANDBOX_CONFIG_KEYS`
so the field round-trips through internal plugin config without leaking
to
  external plugin metadata
- Updated tests in `command-managed-runtime.test.ts`,
  `execution-target-sandbox.test.ts`, `sandbox-callback-bridge.test.ts`,
  `environment-execution-target.test.ts`

## Verification

- `pnpm --filter @paperclipai/adapter-utils test`
- `pnpm --filter @paperclipai/server test --
environment-execution-target`
- `pnpm --filter @paperclipai/sandbox-providers-e2b test`
- Manual QA: boot a Paperclip instance, create an E2B-backed
environment, run a
claude_local agent against it, and confirm the run completes (verifies
bash
  shell semantics flow through the callback bridge end-to-end)

## Risks

- E2B sandbox commands now run under `bash -lc` instead of `sh -lc`.
Bash is a
strict superset for the commands we issue (no busybox-only flags in our
shell
scripts), so risk is low. The shellCommand field is opt-in via lease
metadata —
  providers that don't declare it stay on `sh`.
- New optional field on `CommandManagedRuntimeSpec` and
`AdapterSandboxExecutionTarget`.
  Consumers ignoring the field retain previous behaviour (sh).
- Lease metadata now carries an additional field. Existing leases
without
`shellCommand` resolve to `null` and fall back to sh — backwards
compatible.

## Model Used

- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A (no UI changes)
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-05-03 12:19:35 -07:00
Devin Foley
a4ac6ff133
Add sandbox callback bridge for remote environment API access (#4801)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Agents can run inside sandboxed environments like E2B, which are
isolated from the host network
> - Sandboxed agents need to call back to the Paperclip API to report
progress, post comments, and update issue status
> - But sandbox environments cannot reach the Paperclip server directly
because they run in isolated network namespaces
> - This PR adds a callback bridge that proxies API requests from the
sandbox to the Paperclip server, running as a local HTTP server on the
host that forwards authenticated requests
> - The bridge is started automatically when an adapter launches a
sandbox execution, and torn down when the run completes
> - The benefit is sandboxed agents can interact with the Paperclip API
without requiring network-level access to the host, enabling E2B and
similar providers to work end-to-end

## What Changed

- Added `sandbox-callback-bridge.ts` in `packages/adapter-utils/` — a
lightweight HTTP bridge server that accepts requests from sandbox
environments and proxies them to the Paperclip API with authentication
- Added request validation and security policy: the bridge only forwards
requests to the configured API URL, validates content types, enforces
size limits, and rejects non-API paths
- Wired the bridge into all remote adapter execute paths (claude, codex,
cursor, gemini, pi) — the bridge starts before the agent process and the
bridge URL is passed via environment variables
- Updated `environment-execution-target.ts` to prefer the explicit API
URL from environment lease metadata for sandbox callback routing
- Fixed Claude sandbox runtime setup to work with the bridge
configuration
- Added comprehensive test coverage for bridge request handling, policy
enforcement, and sandbox execution integration
- Fixed browser bundling — the bridge module is excluded from the
frontend bundle via the adapter-utils index export

## Verification

- `pnpm test` — all existing and new tests pass, including bridge unit
tests and sandbox execution integration tests
- `pnpm typecheck` — clean
- Manual: configure an E2B environment, run an agent task, verify the
agent can post comments and update issue status through the bridge

## Risks

- Medium. This is a new network-facing component (HTTP server on
localhost). The security policy restricts forwarding to the configured
API URL only and validates all requests, but any proxy introduces attack
surface. The bridge binds to localhost only and is scoped to the
lifetime of a single agent run.

## Model Used

Codex GPT 5.4 high via Paperclip.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
2026-04-29 16:37:34 -07:00