mirror of
https://github.com/alkimake/paperclip.git
synced 2026-06-14 18:10:39 +09:00
10 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
486fb88a15
|
Add Cloudflare sandbox provider plugin (#5687)
> _Stacked on top of #5685 → #5686. Diff against master includes commits from earlier PRs in the stack — review focuses on the two new commits (`Extend sandbox callback bridge for Worker-hosted plugins` + `Add Cloudflare sandbox provider plugin`)._ ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Each agent runs in a sandbox environment, and operators choose which provider backs that sandbox — today E2B and Daytona are bundled with the platform > - Cloudflare Workers + Durable Objects + the Sandbox SDK offer a credible new option: globally distributed, cheap idle, and operator-deployable as a single Worker > - To plug it in, Paperclip needs (a) a provider plugin that speaks the `PaperclipPluginManifestV1` lifecycle and (b) a small operator-deployed Worker — the **bridge** — that adapts Paperclip's runtime RPCs to the Cloudflare Sandbox SDK > - The plugin extends the existing sandbox-callback-bridge with a `bridge.transport: "worker"` discriminator so the platform routes runtime RPCs through the Worker bridge instead of the in-process runner > - This pull request adds the plugin, the bridge Worker template, and the supporting adapter-utils + server hooks the new transport needs > - The benefit is that operators can run sandboxes on Cloudflare's edge with no new platform code beyond installing the plugin and deploying the Worker ## What Changed **Shared support (`Extend sandbox callback bridge for Worker-hosted plugins`):** - `packages/adapter-utils/src/sandbox-callback-bridge.{ts,test.ts}`: expose `expectedHostHeader` so plugin-side bridge clients can verify the canonical request envelope before forwarding. - `packages/adapter-utils/src/command-managed-runtime.{ts,test.ts}`: relax the always-fresh runner construction so callers can re-use a runner across exec calls (Worker-hosted bridges hold the runner inside a Durable Object). - `server/src/services/environment-runtime.ts` + `environment-runtime.test.ts`: route Worker-hosted bridges through the same env-shaping path as E2B and pin the `requestEnv` contract. - `server/src/services/plugin-environment-driver.ts`: thread an optional `issueId` through the runtime descriptor so bridges can scope leases to the originating issue (used by Cloudflare to map a sandbox to the issue/workflow for billing and audit). - `packages/plugins/sdk/src/protocol.ts`: add `issueId?` to `PluginEnvironmentDriverBaseParams` and the new `bridge.transport: "worker"` discriminator that the new plugin declares. - `server/__tests__/heartbeat-plugin-environment.test.ts`: pin the heartbeat path against the new runtime descriptor. **The Cloudflare plugin itself (`Add Cloudflare sandbox provider plugin`):** - `packages/plugins/sandbox-providers/cloudflare/`: plugin entry, manifest, plugin runtime (lifecycle + bridge client), config parsing, and Vitest coverage. Manifest declares `bridge.transport: "worker"` so the platform routes runtime RPCs through the bridge client. - `bridge-template/`: a Worker template the operator deploys with `wrangler`. Owns Durable Object-backed sessions (`sessions.ts`), exec/stream routes (`exec.ts`, `routes.ts`), and an HMAC auth layer (`auth.ts`) that pins the `Host` header surface. Includes the SDK-contract-correct exec implementation, lease recovery, and chunked stdout/stderr streaming. - Tests cover lease/session handoff (`bridge-template/src/exec.test.ts`, `routes.test.ts`), bridge client request shaping (`src/bridge-client.test.ts`), and end-to-end plugin behavior (`src/plugin.test.ts`) including streamed exec output. 27 tests in total. - `README.md` walks the operator through deploying the bridge Worker, registering the plugin, and configuring the runtime. ## Verification - `pnpm typecheck` - `pnpm exec vitest run --no-coverage packages/adapter-utils/src/sandbox-callback-bridge.test.ts packages/adapter-utils/src/command-managed-runtime.test.ts server/src/__tests__/environment-runtime.test.ts server/src/__tests__/heartbeat-plugin-environment.test.ts` - `(cd packages/plugins/sandbox-providers/cloudflare && pnpm test)` — 27 passing For an operator-side smoke test: 1. Deploy the bridge: `cd packages/plugins/sandbox-providers/cloudflare/bridge-template && wrangler deploy` 2. Register the plugin in your Paperclip instance, point its bridge URL at the deployed Worker, set the HMAC shared secret. 3. Create a sandbox environment whose provider is `cloudflare`, then run a Codex or Claude job against it. ## Risks - Adds a new `bridge.transport: "worker"` code path, but the existing E2B / Daytona transports go through the same shaped helpers and have explicit test coverage that pins their behavior unchanged. - The Worker bridge stores session state in a Durable Object; operator instances must be aware of the corresponding Cloudflare costs (DO requests, storage). Documented in the README. - The `issueId` plumbing is optional throughout — existing plugins that don't supply it continue to work. ## Model Used - Provider: Anthropic - Model: Claude Opus 4.7 (1M context) - Capabilities used: extended reasoning, tool use (Read/Edit/Bash/Grep) ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots — N/A, no UI change - [x] I have updated relevant documentation to reflect my changes (plugin README, bridge-template README) - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing> |
||
|
|
778e775c35
|
Add secrets provider vaults and remote import (#5429)
## Thinking Path
> - Paperclip orchestrates AI-agent companies and needs secrets handling
to work across local development, hosted operators, and governed agent
execution.
> - The affected subsystem is the company-scoped secrets control plane:
database schema, server services/routes, CLI workflows, and the Secrets
settings UI.
> - The gap was that secrets were local-only and operators could not
manage provider vaults or import existing remote references without
exposing plaintext.
> - This branch adds provider vault configuration plus an AWS Secrets
Manager remote-import path while preserving company boundaries, binding
context, and audit trails.
> - I kept the PR to a single branch PR, removed unrelated
lockfile/package drift, rebased the full branch onto the current
`public-gh/master`, and addressed fresh Greptile findings.
> - The benefit is a reviewable implementation of provider-backed
secrets with focused tests covering provider selection, import
conflicts, deleted secret reuse, rotation guards, and AWS signing
behavior.
## What Changed
- Added provider vault support for company secrets, including provider
config storage, default vault handling, health checks, binding usage,
access events, and remote import preview/commit.
- Added an AWS Secrets Manager provider using SigV4 request signing,
bounded request timeouts, namespace guardrails, cached runtime
credential resolution, and external-reference linking without plaintext
reads.
- Added Secrets UI surfaces for vault management and remote import, plus
CLI/API documentation for setup and operations.
- Stabilized routine webhook secret binding paths and SSH
environment-driver fixture bindings discovered during verification.
- Addressed Greptile and CI findings: no lockfile/package drift,
monotonic migration metadata, disabled-vault default races, soft-deleted
secret hiding/recreate behavior, remove behavior with disabled vaults,
soft-deleted external-reference re-import, non-active rotation guards,
managed-secret soft deletion through PATCH, and per-call AWS SDK
credential client churn.
- Rebased this branch onto `public-gh/master` at `
|
||
|
|
5c2f9aba9d
|
Run explicit-environment adapter tests on the requested target instead of falling back to the host (#5277)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - When a user clicks "Test" on a configured environment (SSH or sandbox), the agent-test route exercises the adapter against that target > - The route previously fell back to running the probe on the Paperclip host whenever an explicit environment target couldn't be resolved, with the test report still saying "passed" > - That hid two real failure modes: misconfigured environments looked green, and sandbox environments were never actually exercised > - This pull request acquires an ad-hoc lease and realizes a workspace for sandbox/plugin test environments, resolves a sandbox execution target wired to the environment runtime, and returns synthesized diagnostics instead of running a host probe when an explicit env target can't be resolved > - The benefit is the Test action surfaces the real environment state and never silently exercises the wrong machine ## What Changed - `server/routes/agents.ts`: acquire an ad-hoc lease and realize a workspace for sandbox/plugin test environments; resolve a sandbox execution target wired to the environment runtime - Return synthesized diagnostics (no host fallback) when an explicit env target can't be resolved - `server/services/environment-runtime.ts`: small adjustments to support the explicit-env-target case - Clarify test-route messages so they no longer claim a host fallback in explicit env flows - New `agent-test-environment-routes.test.ts` covers the guard and missing-environment path ## Verification - `pnpm vitest run --no-coverage server/src/__tests__/agent-test-environment-routes.test.ts` - `pnpm typecheck` clean - Manual: a deliberately misconfigured sandbox environment now reports diagnostics instead of a misleading host-pass ## Risks Medium — Test route behavior change. Explicit environments that previously appeared to pass via host fallback will now report their real state. This is the desired behavior, but operators should expect to see new failures for environments that were never actually working. ## Model Used Claude Opus 4.7 (1M context) ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable — new tests cover guard + missing-env paths - [x] If this change affects the UI, I have included before/after screenshots — N/A (no UI) - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge |
||
|
|
076067865f
|
Migrate SSH environment callback to bridge (#5116)
> **Stacked PR (part 3 of 7).** Depends on: - PR #5114 - PR #5115 > Diff against `master` includes commits from earlier PRs in the stack — the new commit in this PR is the topmost one. ## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agents executing on a remote SSH-backed environment need a way to call back into > the Paperclip control plane (run events, log streaming, signals) > - When the SSH host can't reach the Paperclip host (NAT, firewalls, or simply not > on the same network), the run silently fails or hangs — a recurring class of > failure during SSH testing > - In sandboxed environments we already solved this with a callback bridge that > tunnels back through the existing connection; SSH was the odd one out > - This PR migrates SSH execution to use the same callback bridge, so every > adapter's remote run uses one consistent reverse-channel. Per-adapter SSH glue > is deleted in favour of a shared `CommandManagedRuntimeRunner` built from the > SSH spec > - The benefit is fewer SSH-specific failure modes, a smaller code surface, and > one place to evolve the callback contract going forward ## What Changed - Added `createSshCommandManagedRuntimeRunner` in `packages/adapter-utils/src/ssh.ts` that adapts an SSH spec into a generic command-managed-runtime runner (with cwd, env, and timeout handling) - Removed `paperclipApiUrl` from `SshRemoteExecutionSpec`; the bridge URL now flows through the shared runner - Reworked `execution-target.ts` to use the SSH runner alongside sandbox runners via a unified `CommandManagedRuntimeRunner` interface - Simplified `remote-managed-runtime.ts` and `sandbox-managed-runtime.ts` to consume the shared runner abstraction - Deleted per-adapter SSH callback wiring from claude-local, codex-local, cursor-local, gemini-local, opencode-local, pi-local execute.ts files - Removed `environment-runtime-driver-contract.test.ts` (the contract is now enforced by `environment-execution-target.test.ts`) - Added/updated `execute.remote.test.ts` cases for each adapter to cover the SSH runner path ## Verification - `pnpm --filter @paperclipai/adapter-utils test` - `pnpm test -- execute.remote` (covers all six local adapters' SSH paths) - Manual QA: ran a claude-local agent against an SSH-backed environment, confirmed the agent successfully called back to `/api/agent-callback/*` endpoints during the run ## Risks - Refactor touches all six local adapters. If any adapter had subtle SSH-specific behaviour that wasn't captured in tests, it could regress. Mitigation: each adapter's `execute.remote.test.ts` was extended. - `paperclipApiUrl` removal from `SshRemoteExecutionSpec` is a breaking type change for any internal consumer. Verified no external plugins consume this type. - The new `CommandManagedRuntimeRunner` shape is a public surface in `@paperclipai/adapter-utils`; downstream plugins implementing custom runners may need updates, but no such plugins exist in this repo. ## Model Used - OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI - Provider: OpenAI - Used to author the code changes in this PR ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots — N/A - [ ] I have updated relevant documentation to reflect my changes — N/A - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge |
||
|
|
a7b45938b7
|
Let sandbox providers declare shell defaults (#5114)
## Thinking Path
> - Paperclip orchestrates AI agents for zero-human companies
> - Agents execute in sandboxed remote environments served by pluggable
sandbox
> providers (E2B today, more later)
> - Today every sandbox command runs under `sh -lc` regardless of what
the
> provider's container actually ships
> - That misses bash-only shell init on E2B (which ships bash) and
prevents
> future providers from declaring a different default — there's no way
for a
> provider to say "I have bash, use it"
> - This PR adds a `shellCommand` field to sandbox execution targets so
providers
> can declare their preferred shell ("bash" for E2B), threads it through
the
> sandbox-managed-runtime client, callback bridge, and execution-target
shell
> helper, and validates the value at the lease-metadata boundary
> - The benefit is that sandbox commands run under the right shell on
the right
> provider, and adding new sandbox providers only needs to declare a
shell
> preference
## What Changed
- Added `packages/adapter-utils/src/sandbox-shell.ts` exporting
`preferredShellForSandbox(shellCommand)` (returns `"bash"` if input is
`"bash"`,
else `"sh"`)
- Added `shellCommand?: "bash" | "sh" | null` to
`AdapterSandboxExecutionTarget`
and `CommandManagedRuntimeSpec`; threaded it through
`runAdapterExecutionTargetShellCommand`,
`prepareAdapterExecutionTargetRuntime`,
and `startAdapterExecutionTargetPaperclipBridge`
- `createCommandManagedRuntimeClient`, `prepareCommandManagedRuntime`,
and
`createCommandManagedSandboxCallbackBridgeQueueClient` now take an
optional
`shellCommand` and use `preferredShellForSandbox` to pick the shell
- `startSandboxCallbackBridgeServer` accepts a `shellCommand` for its
server
startup, readiness probe, and stop hook
- E2B sandbox plugin declares `shellCommand: "bash"` in `leaseMetadata`
- `resolveEnvironmentExecutionTarget` reads `shellCommand` from lease
metadata
(validating against `"bash" | "sh" | null`)
- `environment-runtime.ts` adds `"shellCommand"` to
`INTERNAL_PLUGIN_SANDBOX_CONFIG_KEYS`
so the field round-trips through internal plugin config without leaking
to
external plugin metadata
- Updated tests in `command-managed-runtime.test.ts`,
`execution-target-sandbox.test.ts`, `sandbox-callback-bridge.test.ts`,
`environment-execution-target.test.ts`
## Verification
- `pnpm --filter @paperclipai/adapter-utils test`
- `pnpm --filter @paperclipai/server test --
environment-execution-target`
- `pnpm --filter @paperclipai/sandbox-providers-e2b test`
- Manual QA: boot a Paperclip instance, create an E2B-backed
environment, run a
claude_local agent against it, and confirm the run completes (verifies
bash
shell semantics flow through the callback bridge end-to-end)
## Risks
- E2B sandbox commands now run under `bash -lc` instead of `sh -lc`.
Bash is a
strict superset for the commands we issue (no busybox-only flags in our
shell
scripts), so risk is low. The shellCommand field is opt-in via lease
metadata —
providers that don't declare it stay on `sh`.
- New optional field on `CommandManagedRuntimeSpec` and
`AdapterSandboxExecutionTarget`.
Consumers ignoring the field retain previous behaviour (sh).
- Lease metadata now carries an additional field. Existing leases
without
`shellCommand` resolve to `null` and fall back to sh — backwards
compatible.
## Model Used
- OpenAI GPT-5.4 (reasoning effort: high) via Codex CLI
- Provider: OpenAI
- Used to author the code changes in this PR
## Checklist
- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots — N/A (no UI changes)
- [ ] I have updated relevant documentation to reflect my changes — N/A
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
|
||
|
|
c0ce35d1fb
|
Improve E2B plugin configuration UX and fix execution timeouts (#4802)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - E2B is a sandbox provider plugin that runs agent code in isolated cloud environments > - Operators configure E2B through the plugin settings page > - But the E2B API key configuration was unclear — the settings field description didn't explain that pasted keys are auto-saved as company secrets, and the fallback to the host `E2B_API_KEY` variable wasn't documented > - Additionally, long-running E2B sandbox commands were timing out because the plugin environment RPC driver used a fixed timeout, and environment commands competed for the single foreground command slot > - This PR clarifies the E2B configuration UX, fixes RPC timeouts for plugin environment execution, and runs E2B environment commands in background mode to avoid blocking the foreground slot > - The benefit is clearer E2B setup for operators and more reliable sandbox command execution ## What Changed - Updated E2B plugin manifest and settings UI to clarify API key configuration — field description now explains that pasted keys are saved as company secrets and documents the `E2B_API_KEY` host fallback - Added test coverage for the plugin settings page rendering - Fixed `plugin-environment-driver.ts` to pass the configured timeout through to RPC calls instead of using a hardcoded default - Updated `environment-runtime.ts` to propagate timeout from the environment lease to the plugin driver - Changed E2B sandbox command execution to use background handles so long-running agent commands don't block the foreground slot needed by the callback bridge ## Verification - `pnpm test` — all existing and new tests pass - `pnpm typecheck` — clean - Manual: navigate to plugin settings, verify E2B API key field shows the updated description text - Manual: run an E2B-backed agent task with a long-running command, verify it completes without RPC timeout ## Risks - Low risk. Configuration UX change is cosmetic. The timeout fix passes an existing value through instead of dropping it. Background command execution is a behavioral change but only affects E2B sandbox commands — the foreground slot is still available for bridge health checks. ## Model Used Codex GPT 5.4 high via Paperclip. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge |
||
|
|
a4ac6ff133
|
Add sandbox callback bridge for remote environment API access (#4801)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agents can run inside sandboxed environments like E2B, which are isolated from the host network > - Sandboxed agents need to call back to the Paperclip API to report progress, post comments, and update issue status > - But sandbox environments cannot reach the Paperclip server directly because they run in isolated network namespaces > - This PR adds a callback bridge that proxies API requests from the sandbox to the Paperclip server, running as a local HTTP server on the host that forwards authenticated requests > - The bridge is started automatically when an adapter launches a sandbox execution, and torn down when the run completes > - The benefit is sandboxed agents can interact with the Paperclip API without requiring network-level access to the host, enabling E2B and similar providers to work end-to-end ## What Changed - Added `sandbox-callback-bridge.ts` in `packages/adapter-utils/` — a lightweight HTTP bridge server that accepts requests from sandbox environments and proxies them to the Paperclip API with authentication - Added request validation and security policy: the bridge only forwards requests to the configured API URL, validates content types, enforces size limits, and rejects non-API paths - Wired the bridge into all remote adapter execute paths (claude, codex, cursor, gemini, pi) — the bridge starts before the agent process and the bridge URL is passed via environment variables - Updated `environment-execution-target.ts` to prefer the explicit API URL from environment lease metadata for sandbox callback routing - Fixed Claude sandbox runtime setup to work with the bridge configuration - Added comprehensive test coverage for bridge request handling, policy enforcement, and sandbox execution integration - Fixed browser bundling — the bridge module is excluded from the frontend bundle via the adapter-utils index export ## Verification - `pnpm test` — all existing and new tests pass, including bridge unit tests and sandbox execution integration tests - `pnpm typecheck` — clean - Manual: configure an E2B environment, run an agent task, verify the agent can post comments and update issue status through the bridge ## Risks - Medium. This is a new network-facing component (HTTP server on localhost). The security policy restricts forwarding to the configured API URL only and validates all requests, but any proxy introduces attack surface. The bridge binds to localhost only and is scoped to the lifetime of a single agent run. ## Model Used Codex GPT 5.4 high via Paperclip. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge |
||
|
|
4cf612a92d
|
Fix runtime state race, workspace sync, plugin startup, and orphaned leases (#4804)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies > - Agents run inside environments that are leased, and the server manages runtime state, workspace configuration, and plugin lifecycle > - Several edge cases caused failures during concurrent operations: a race condition in runtime state insertion could produce duplicate-key errors, reused workspaces didn't sync their configuration when the parent issue was updated, sandbox provider plugins could be queried before registration completed, and orphaned environment leases from failed runs were never released > - This PR fixes these four runtime/environment issues > - The benefit is more reliable concurrent agent execution and proper resource cleanup ## What Changed - `services/heartbeat.ts`: Fixed a race condition where concurrent runtime state inserts could fail with a duplicate-key error by using an upsert pattern - `services/issues.ts`: Sync reused workspace configuration when an issue is updated, so the workspace reflects the latest issue state - `services/environment-runtime.ts`: Fixed a startup race where sandbox provider plugins could be queried before registration completed, by awaiting plugin readiness before resolving environment drivers - `services/heartbeat.ts`: Release environment leases for orphaned runs that lost their process without cleanup ## Verification - `pnpm test` — all existing and new tests pass, including new tests for runtime state upsert and process recovery lease cleanup - `pnpm typecheck` — clean - Manual: trigger concurrent agent runs to verify no duplicate-key failures; verify orphaned leases are released after process loss ## Risks - Low risk. The runtime state upsert changes insert-to-upsert behavior, which could mask a legitimate duplicate if two different runs produce the same key — but this is prevented by the run ID being part of the key. The plugin startup await is bounded by the existing registration timeout. ## Model Used Codex GPT 5.4 high via Paperclip. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge |
||
|
|
5bd0f578fd
|
Generalize sandbox provider core for plugin-only providers (#4449)
## Thinking Path > - Paperclip is a control plane, so optional execution providers should sit at the plugin edge instead of hardcoding provider-specific behavior into core shared/server/ui layers. > - Sandbox environments are already first-class, and the fake provider proves the built-in path; the remaining gap was that real providers still leaked provider-specific config and runtime assumptions into core. > - That coupling showed up in config normalization, secret persistence, capabilities reporting, lease reconstruction, and the board UI form fields. > - As long as core knew about those provider-shaped details, shipping a provider as a pure third-party plugin meant every new provider would still require host changes. > - This pull request generalizes the sandbox provider seam around schema-driven plugin metadata and generic secret-ref handling. > - The runtime and UI now consume provider metadata generically, so core only special-cases the built-in fake provider while third-party providers can live entirely in plugins. ## What Changed - Added generic sandbox-provider capability metadata so plugin-backed providers can expose `configSchema` through shared environment support and the environments capabilities API. - Reworked sandbox config normalization/persistence/runtime resolution to handle schema-declared secret-ref fields generically, storing them as Paperclip secrets and resolving them for probe/execute/release flows. - Generalized plugin sandbox runtime handling so provider validation, reusable-lease matching, lease reconstruction, and plugin worker calls all operate on provider-agnostic config instead of provider-shaped branches. - Replaced hardcoded sandbox provider form fields in Company Settings with schema-driven rendering and blocked agent environment selection from the built-in fake provider. - Added regression coverage for the generic seam across shared support helpers plus environment config, probe, routes, runtime, and sandbox-provider runtime tests. ## Verification - `pnpm vitest --run packages/shared/src/environment-support.test.ts server/src/__tests__/environment-config.test.ts server/src/__tests__/environment-probe.test.ts server/src/__tests__/environment-routes.test.ts server/src/__tests__/environment-runtime.test.ts server/src/__tests__/sandbox-provider-runtime.test.ts` - `pnpm -r typecheck` ## Risks - Plugin sandbox providers now depend more heavily on accurate `configSchema` declarations; incorrect schemas can misclassify secret-bearing fields or omit required config. - Reusable lease matching is now metadata-driven for plugin-backed providers, so providers that fail to persist stable metadata may reprovision instead of resuming an existing lease. - The UI form is now fully schema-driven for plugin-backed sandbox providers; provider manifests without good defaults or descriptions may produce a rougher operator experience. ## Model Used - OpenAI Codex via `codex_local` - Model ID: `gpt-5.4` - Reasoning effort: `high` - Context window observed in runtime session metadata: `258400` tokens - Capabilities used: terminal tool execution, git, and local code/test inspection ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge |
||
|
|
70679a3321
|
Add sandbox environment support (#4415)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies. > - The environment/runtime layer decides where agent work executes and how the control plane reaches those runtimes. > - Today Paperclip can run locally and over SSH, but sandboxed execution needs a first-class environment model instead of one-off adapter behavior. > - We also want sandbox providers to be pluggable so the core does not hardcode every provider implementation. > - This branch adds the Sandbox environment path, the provider contract, and a deterministic fake provider plugin. > - That required synchronized changes across shared contracts, plugin SDK surfaces, server runtime orchestration, and the UI environment/workspace flows. > - The result is that sandbox execution becomes a core control-plane capability while keeping provider implementations extensible and testable. ## What Changed - Added sandbox runtime support to the environment execution path, including runtime URL discovery, sandbox execution targeting, orchestration, and heartbeat integration. - Added plugin-provider support for sandbox environments so providers can be supplied via plugins instead of hardcoded server logic. - Added the fake sandbox provider plugin with deterministic behavior suitable for local and automated testing. - Updated shared types, validators, plugin protocol definitions, and SDK helpers to carry sandbox provider and workspace-runtime contracts across package boundaries. - Updated server routes and services so companies can create sandbox environments, select them for work, and execute work through the sandbox runtime path. - Updated the UI environment and workspace surfaces to expose sandbox environment configuration and selection. - Added test coverage for sandbox runtime behavior, provider seams, environment route guards, orchestration, and the fake provider plugin. ## Verification - Ran locally before the final fixture-only scrub: - `pnpm -r typecheck` - `pnpm test:run` - `pnpm build` - Ran locally after the final scrub amend: - `pnpm vitest run server/src/__tests__/runtime-api.test.ts` - Reviewer spot checks: - create a sandbox environment backed by the fake provider plugin - run work through that environment - confirm sandbox provider execution does not inherit host secrets implicitly ## Risks - This touches shared contracts, plugin SDK plumbing, server runtime orchestration, and UI environment/workspace flows, so regressions would likely show up as cross-layer mismatches rather than isolated type errors. - Runtime URL discovery and sandbox callback selection are sensitive to host/bind configuration; if that logic is wrong, sandbox-backed callbacks may fail even when execution succeeds. - The fake provider plugin is intentionally deterministic and test-oriented; future providers may expose capability gaps that this branch does not yet cover. ## Model Used - OpenAI Codex coding agent on a GPT-5-class backend in the Paperclip/Codex harness. Exact backend model ID is not exposed in-session. Tool-assisted workflow with shell execution, file editing, git history inspection, and local test execution. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [ ] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge |