mirror of
https://github.com/alkimake/paperclip.git
synced 2026-06-16 10:50:38 +09:00
261 lines
8 KiB
Markdown
261 lines
8 KiB
Markdown
# PAP-1229 Agent OS Follow-up Plan
|
|
|
|
Date: 2026-04-08
|
|
Related issue: `PAP-1229`
|
|
Companion analysis: `doc/plans/2026-04-08-agent-os-technical-report.md`
|
|
|
|
## Goal
|
|
|
|
Turn the `agent-os` research into a low-risk Paperclip execution plan that preserves Paperclip's control-plane model while testing the few runtime ideas that appear worth adopting.
|
|
|
|
## Decision summary
|
|
|
|
Paperclip should not absorb `agent-os` as a product model or orchestration layer.
|
|
|
|
Paperclip should evaluate `agent-os` in three narrow areas:
|
|
|
|
1. optional agent runtime for selected local adapters
|
|
2. capability-based runtime permission vocabulary
|
|
3. snapshot-backed disposable execution roots
|
|
|
|
Everything else should stay out of scope unless those three experiments produce strong evidence.
|
|
|
|
## Success condition
|
|
|
|
This work is successful when Paperclip has:
|
|
|
|
- a clear yes/no answer on whether `agent-os` is worth supporting as an execution substrate
|
|
- a concrete adapter/runtime experiment with measurable results
|
|
- a proposed runtime capability model that fits current Paperclip adapters
|
|
- a clear decision on whether snapshot-backed execution roots are worth integrating
|
|
|
|
## Non-goals
|
|
|
|
Do not:
|
|
|
|
- replace Paperclip heartbeats, issues, comments, approvals, or budgets with `agent-os` primitives
|
|
- introduce Rust/sidecar requirements for all local execution paths
|
|
- migrate all adapters at once
|
|
- add runtime workflow/queue abstractions to Paperclip core
|
|
|
|
## Existing Paperclip integration points
|
|
|
|
The plan should stay anchored to these existing surfaces:
|
|
|
|
- `packages/adapter-utils/src/types.ts`
|
|
- adapter contract, runtime service reporting, session metadata, and capability normalization targets
|
|
- `server/src/services/heartbeat.ts`
|
|
- execution entry point, log capture, issue comment summaries, and cost reporting
|
|
- `server/src/services/execution-workspaces.ts`
|
|
- current workspace lifecycle and git-oriented cleanup/readiness model
|
|
- `server/src/services/plugin-loader.ts`
|
|
- typed host capability boundary and extension loading patterns
|
|
- local adapter implementations in `packages/adapters/*/src/server/`
|
|
- current execution behavior to compare against an `agent-os`-backed path
|
|
|
|
## Phase plan
|
|
|
|
### Phase 0: constraints and experiment design
|
|
|
|
Objective:
|
|
|
|
- make the evaluation falsifiable before writing integration code
|
|
|
|
Deliverables:
|
|
|
|
- short experiment brief added to this document or a child issue
|
|
- chosen first runtime target: `pi_local` or `opencode_local`
|
|
- baseline metrics definition
|
|
|
|
Questions to lock down:
|
|
|
|
- what exact developer experience should improve
|
|
- what security/isolation property we expect to gain
|
|
- what failure modes are unacceptable
|
|
- whether the prototype is adapter-only or a deeper internal runtime abstraction spike
|
|
|
|
Exit criteria:
|
|
|
|
- a single first target chosen
|
|
- measurable comparison criteria agreed on
|
|
|
|
Recommended metrics:
|
|
|
|
- cold start latency
|
|
- session resume reliability across heartbeats
|
|
- transcript/log quality
|
|
- implementation complexity
|
|
- operational complexity on local dev machines
|
|
|
|
### Phase 1: `agentos_local` spike
|
|
|
|
Objective:
|
|
|
|
- prove that Paperclip can drive one local agent through an `agent-os` runtime without breaking heartbeat semantics
|
|
|
|
Suggested scope:
|
|
|
|
- implement a new experimental adapter, `agentos_local`, or a feature-flagged runtime path under one existing adapter
|
|
- start with `pi_local` or `opencode_local`
|
|
- keep Paperclip's existing heartbeat, issue, workspace, and comment flow authoritative
|
|
|
|
Minimum implementation shape:
|
|
|
|
- adapter accepts model/runtime config
|
|
- `server/src/services/heartbeat.ts` still owns run lifecycle
|
|
- execution result still maps into existing `AdapterExecutionResult`
|
|
- session state still fits current `sessionParams` / `sessionDisplayId` flow
|
|
|
|
What to verify:
|
|
|
|
- checkout and heartbeat flow still work end to end
|
|
- resume across multiple heartbeats works
|
|
- logs/transcripts remain readable in the UI
|
|
- failure paths surface cleanly in issue comments and run logs
|
|
|
|
Exit criteria:
|
|
|
|
- one agent type can run reliably through the new path
|
|
- documented comparison against the existing local adapter path
|
|
- explicit recommendation: continue, pause, or abandon
|
|
|
|
### Phase 2: capability-based runtime permissions
|
|
|
|
Objective:
|
|
|
|
- introduce a Paperclip-native capability vocabulary without coupling the product to `agent-os`
|
|
|
|
Suggested scope:
|
|
|
|
- extend adapter config schema vocabulary for runtime permissions
|
|
- prototype normalized capabilities such as:
|
|
- `fs.read`
|
|
- `fs.write`
|
|
- `network.fetch`
|
|
- `network.listen`
|
|
- `process.spawn`
|
|
- `env.read`
|
|
|
|
Integration targets:
|
|
|
|
- `packages/adapter-utils/src/types.ts`
|
|
- adapter config-schema support
|
|
- server-side runtime config validation
|
|
- future board-facing UI for permissions, if needed
|
|
|
|
What to avoid:
|
|
|
|
- building a full human policy UI before the vocabulary is proven useful
|
|
- forcing every adapter to implement capability enforcement immediately
|
|
|
|
Exit criteria:
|
|
|
|
- documented capability schema
|
|
- one adapter path using it meaningfully
|
|
- clear compatibility story for non-`agent-os` adapters
|
|
|
|
### Phase 3: snapshot-backed execution root experiment
|
|
|
|
Objective:
|
|
|
|
- determine whether a layered/snapshotted root model improves some Paperclip workloads
|
|
|
|
Suggested scope:
|
|
|
|
- evaluate it only for disposable or non-repo-heavy tasks first
|
|
- keep git worktree-based repo editing as the default for codebase tasks
|
|
|
|
Promising use cases:
|
|
|
|
- routine-style runs
|
|
- ephemeral preview/test environments
|
|
- isolated document/artifact generation
|
|
- tasks that do not need full git history or branch semantics
|
|
|
|
Integration targets:
|
|
|
|
- `server/src/services/execution-workspaces.ts`
|
|
- workspace realization paths called from `server/src/services/heartbeat.ts`
|
|
|
|
Exit criteria:
|
|
|
|
- clear statement on which workload classes benefit
|
|
- clear statement on which workloads should stay on worktrees
|
|
- go/no-go decision for broader implementation
|
|
|
|
### Phase 4: typed host tool evaluation
|
|
|
|
Objective:
|
|
|
|
- identify where Paperclip should prefer explicit typed tools over ambient shell access
|
|
|
|
Suggested scope:
|
|
|
|
- compare `agent-os` host-toolkit ideas with existing plugin and runtime-service surfaces
|
|
- choose 1-2 sensitive operations that should become typed tools
|
|
|
|
Good candidates:
|
|
|
|
- git metadata/status inspection
|
|
- runtime service inspection
|
|
- deployment/preview status retrieval
|
|
- generated artifact publishing
|
|
|
|
Exit criteria:
|
|
|
|
- one concrete proposal for typed-tool adoption in Paperclip
|
|
- clear statement on whether this belongs in plugins, adapters, or core services
|
|
|
|
## Recommended sequencing
|
|
|
|
Recommended order:
|
|
|
|
1. Phase 0
|
|
2. Phase 1
|
|
3. Phase 2
|
|
4. Phase 3
|
|
5. Phase 4
|
|
|
|
Reasoning:
|
|
|
|
- Phase 1 is the fastest way to invalidate or validate the entire `agent-os` direction
|
|
- Phase 2 is valuable even if Phase 1 is abandoned
|
|
- Phase 3 should wait until there is confidence that the runtime approach is operationally worthwhile
|
|
- Phase 4 is useful independently but should be informed by what Phase 1 and Phase 2 expose
|
|
|
|
## Risks
|
|
|
|
### Technical risk
|
|
|
|
- `agent-os` introduces Rust sidecar and packaging complexity that may outweigh runtime benefits
|
|
|
|
### Product risk
|
|
|
|
- runtime experimentation could blur the boundary between Paperclip as control plane and Paperclip as execution platform
|
|
|
|
### Integration risk
|
|
|
|
- session semantics, log formatting, and failure behavior may degrade relative to current local adapters
|
|
|
|
### Scope risk
|
|
|
|
- a small runtime spike could expand into an adapter-system rewrite if not kept tightly bounded
|
|
|
|
## Guardrails
|
|
|
|
To keep this effort controlled:
|
|
|
|
- keep all experiments behind a clearly experimental adapter or feature flag
|
|
- do not change issue/comment/approval/budget semantics to suit the runtime
|
|
- measure against current local adapters instead of judging in isolation
|
|
- stop after Phase 1 if the operational burden is already clearly too high
|
|
|
|
## Proposed next action
|
|
|
|
The next concrete action should be a small implementation spike issue:
|
|
|
|
- title: `Prototype experimental agentos_local runtime for one local adapter`
|
|
- target adapter: `opencode_local` unless `pi_local` is materially easier
|
|
- expected output: code spike, short verification notes, and a continue/stop recommendation
|
|
|
|
If leadership wants planning only and no spike yet, this document is the handoff artifact for that decision.
|