[codex] Add LLM Wiki plugin host support (#5597)

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - The plugin system needs host contracts and runtime support before
large plugins can integrate cleanly.
> - The source branch mixed the LLM Wiki package with supporting
host/runtime work, managed plugin skills, root-level storage spaces, and
a bookmarks reference plugin.
> - [PAP-9173](/PAP/issues/PAP-9173) asked for the current branch to be
split by file boundary: plugin package separately from everything else.
> - [PAP-9188](/PAP/issues/PAP-9188) clarified that LLM Wiki may have
plugin-local spaces, but Paperclip core should not reorganize top-level
local storage into spaces.
> - Follow-up review clarified that the bookmarks example should not
ship in this PR either.
> - This pull request contains the
non-`packages/plugins/plugin-llm-wiki/` host/runtime work, keeps runtime
state under the selected Paperclip instance root, and no longer includes
the bookmarks example.

## What Changed

- Added/updated plugin host contracts, SDK types, worker RPC plumbing,
managed plugin skill support, and related server tests.
- Removed the bookmarks example plugin package and its
bundled-example/workspace references.
- Removed the root-level local spaces CLI/migration surface and restored
instance-root runtime defaults for config, db, logs, storage, secrets,
workspaces, projects, and adapter homes.
- Replaced shared root `space-paths` helpers with `home-paths` helpers
for core runtime storage.
- Tightened stranded recovery unique-conflict detection so concurrent
recovery scans reuse the raced recovery issue when Postgres errors are
wrapped.
- Kept `packages/plugins/plugin-llm-wiki/` out of this PR diff;
plugin-local spaces remain in the stacked plugin-only PR.

## Verification

- `pnpm exec vitest run cli/src/__tests__/data-dir.test.ts
cli/src/__tests__/home-paths.test.ts cli/src/__tests__/onboard.test.ts
packages/shared/src/home-paths.test.ts
packages/db/src/runtime-config.test.ts
server/src/__tests__/agent-instructions-service.test.ts
server/src/__tests__/claude-local-execute.test.ts
server/src/__tests__/codex-local-execute.test.ts`
- `pnpm exec vitest run packages/db/src/runtime-config.test.ts`
- `pnpm exec vitest run
server/src/__tests__/plugin-routes-authz.test.ts`
- `pnpm --filter @paperclipai/server typecheck`
- `pnpm exec vitest run
server/src/__tests__/heartbeat-process-recovery.test.ts -t "reuses the
raced stranded recovery issue"` skipped locally because embedded
Postgres did not initialize on this macOS temp host; the code path was
typechecked and is covered by Linux CI.
- Boundary check: no core references remain for `PAPERCLIP_SPACE_ID`,
`spaces migrate-default`, `@paperclipai/shared/space-paths`,
`registerSpacesCommands`, or the removed bookmarks example.
- Previous PR head `4f23e034` had green GitHub checks: `verify`, all
four serialized server shards, `e2e`, `Canary Dry Run`, `policy`, Snyk,
and `Greptile Review`. Current head `582f466d` is re-running checks
after the bookmarks deletion.

## Risks

- Plugin host changes touch shared runtime paths, so regressions would
most likely appear in adapter startup, plugin loading, or local dev path
defaults.
- Removing the bookmarks example also removes one demonstration of
plugin database namespaces plus local-folder persistence; remaining
plugin examples still cover bundled example discovery and plugin host
flows.
- The plugin package itself is intentionally deferred to the stacked
plugin-only PR, where LLM Wiki plugin-local spaces live.
- Existing installs that tested the transient root-level spaces CLI
should stop using it; this PR intentionally removes that unsupported
migration surface before merge.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI GPT-5 Codex via Codex CLI, tool use and local code execution
enabled; context window not exposed.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass, except where noted above
for host-specific embedded Postgres initialization
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

Stacked follow-up: PR #5592 contains only
`packages/plugins/plugin-llm-wiki/` and targets this branch.

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
This commit is contained in:
Dotta 2026-05-10 07:34:12 -05:00 committed by GitHub
parent eb12c42009
commit 0096b56a1c
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
40 changed files with 1892 additions and 224 deletions

View file

@ -0,0 +1,135 @@
# LLM Wiki Paperclip Asset And Work-Product Security Gate
Status: accepted Phase 5 policy
Date: 2026-05-06
Owner: Security engineering
Scope: Paperclip-derived ingestion into the LLM Wiki before any asset or work-product content indexing ships
## Decision
Phase 5 remains **fail-closed** for Paperclip assets and work products.
- Paperclip-derived **text extraction is allowed only** for issue titles/descriptions, issue comments, and issue documents.
- Paperclip **assets/attachments** and **issue work products** are **metadata-only** in Phase 5.
- **Linked summaries** and **content extraction** for assets/work products are **not approved** in Phase 5.
- No implementation may fetch `/api/assets/:id/content`, dereference a work-product `url`, scrape preview pages, or embed binary/blob content into source bundles or source snapshots.
This keeps the secure path easier than the insecure one and avoids broadening the wiki into a second content-distribution channel.
## Allowed Source Kinds
These source kinds may contribute body text to Paperclip-derived source bundles:
| Source kind | Allowed body fields | Reason |
| --- | --- | --- |
| Issue | `title`, `description`, identifier/status metadata | First-party Paperclip text under company ACL |
| Comment | `body` | First-party Paperclip text under company ACL |
| Document | `body`, `title`, `key`, revision metadata | First-party Paperclip text under company ACL |
## Assets And Work Products
### Assets / attachments
Allowed in Phase 5:
- metadata-only references built from allowlisted structured fields already stored in Paperclip
- recommended fields: `issueId`, `issueCommentId`, `attachmentId`, `assetId`, `originalFilename`, `contentType`, `byteSize`, `sha256`, `createdAt`, `createdByAgentId`, `createdByUserId`
Disallowed in Phase 5:
- fetching asset bytes from `/api/assets/:id/content`
- parsing any blob body, including `text/plain`, `text/markdown`, `application/json`, images, SVG, PDFs, archives, or office formats
- storing `contentPath` in wiki source bundles or source snapshots
- model summarization of attachment bodies
### Work products
Allowed in Phase 5:
- metadata-only references built from allowlisted structured fields already stored in Paperclip
- recommended fields: `issueId`, `workProductId`, `type`, `provider`, `title`, `status`, `reviewState`, `healthStatus`, `externalId`, `isPrimary`, `createdAt`, `updatedAt`
- optional boolean/derived metadata such as `hasUrl: true`
Disallowed in Phase 5:
- fetching or crawling the work-product `url`
- scraping preview pages, artifacts, pull requests, branches, commits, or custom provider targets through the wiki ingestion path
- storing raw `url` values in wiki source bundles or source snapshots
- model-authored linked summaries derived from off-record content
## MIME Allowlists And Size Caps
No MIME allowlist is approved for asset content extraction in Phase 5 because **no asset body extraction is approved at all**.
- Every asset MIME type is treated as opaque for Paperclip-derived indexing.
- Existing upload limits remain storage concerns, not ingestion approvals.
- Work-product destinations are also opaque regardless of MIME type or size.
Any future issue that wants blob parsing must define:
- a positive MIME allowlist
- per-type parser strategy
- per-source size caps
- sandbox/isolation requirements
- prompt-injection handling
- regression tests for refusal paths
## Redaction Rules
Metadata-only means **structured facts only**, not capability-bearing links.
- Do not persist `contentPath` for assets.
- Do not persist raw work-product `url` values.
- Do not persist query strings, fragments, signed URL tokens, or userinfo.
- Prefer stable identifiers (`assetId`, `workProductId`, `externalId`) over links.
This addresses Sensitive Information Disclosure, Unsafe Consumption of APIs, and Insecure Output Handling risks.
## Provenance Rules
Every metadata-only reference must preserve enough provenance to explain where it came from without reading the underlying content:
- `companyId`
- `issueId`
- attachment/work-product id
- producer identity when available
- timestamps
- an explicit `metadata_only` marker in any future reference/snapshot schema
## Review-Required Behavior
Human review is **not** required for plain metadata-only references that stay inside the allowlisted fields above.
Human review **is required**, with a separate security sign-off issue, before enabling any of the following:
- asset body extraction
- work-product URL fetching
- linked summaries generated from asset/work-product content
- storing raw blob links or raw remote URLs in wiki source material
- non-default-space routing for Paperclip-derived asset/work-product references
## Security Rationale
This gate exists because the current host surfaces have different trust properties:
- issue/comment/document text is first-party Paperclip content already exposed through company-scoped issue/document APIs
- asset content is a blob download surface (`/api/assets/:id/content`) and can carry prompt-injection or parser-risk payloads
- work products can point at arbitrary destinations through `url`, which reintroduces SSRF, token leakage, and prompt-injection risk if dereferenced automatically
Relevant threat classes:
- OWASP LLM Top 10: Prompt Injection, Sensitive Information Disclosure, Insecure Output Handling, Excessive Agency
- OWASP API Top 10: SSRF, Unsafe Consumption of APIs, Broken Object Property Level Authorization
- Saltzer & Schroeder: Least Privilege, Fail Securely, Complete Mediation, Secure Defaults
## Follow-Up Implementation Scope
A follow-up implementation issue is justified only for **metadata-only references**.
That implementation must:
- keep assets/work products out of source-bundle body text
- never fetch blob bytes or remote URLs
- redact capability-bearing link fields
- mark references as `metadata_only`
- ship tests proving source bundles/snapshots never contain `contentPath` or raw work-product `url` fields