mirror of
https://github.com/alkimake/paperclip.git
synced 2026-06-16 02:40:39 +09:00
[codex] Add LLM Wiki plugin host support (#5597)
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies. > - The plugin system needs host contracts and runtime support before large plugins can integrate cleanly. > - The source branch mixed the LLM Wiki package with supporting host/runtime work, managed plugin skills, root-level storage spaces, and a bookmarks reference plugin. > - [PAP-9173](/PAP/issues/PAP-9173) asked for the current branch to be split by file boundary: plugin package separately from everything else. > - [PAP-9188](/PAP/issues/PAP-9188) clarified that LLM Wiki may have plugin-local spaces, but Paperclip core should not reorganize top-level local storage into spaces. > - Follow-up review clarified that the bookmarks example should not ship in this PR either. > - This pull request contains the non-`packages/plugins/plugin-llm-wiki/` host/runtime work, keeps runtime state under the selected Paperclip instance root, and no longer includes the bookmarks example. ## What Changed - Added/updated plugin host contracts, SDK types, worker RPC plumbing, managed plugin skill support, and related server tests. - Removed the bookmarks example plugin package and its bundled-example/workspace references. - Removed the root-level local spaces CLI/migration surface and restored instance-root runtime defaults for config, db, logs, storage, secrets, workspaces, projects, and adapter homes. - Replaced shared root `space-paths` helpers with `home-paths` helpers for core runtime storage. - Tightened stranded recovery unique-conflict detection so concurrent recovery scans reuse the raced recovery issue when Postgres errors are wrapped. - Kept `packages/plugins/plugin-llm-wiki/` out of this PR diff; plugin-local spaces remain in the stacked plugin-only PR. ## Verification - `pnpm exec vitest run cli/src/__tests__/data-dir.test.ts cli/src/__tests__/home-paths.test.ts cli/src/__tests__/onboard.test.ts packages/shared/src/home-paths.test.ts packages/db/src/runtime-config.test.ts server/src/__tests__/agent-instructions-service.test.ts server/src/__tests__/claude-local-execute.test.ts server/src/__tests__/codex-local-execute.test.ts` - `pnpm exec vitest run packages/db/src/runtime-config.test.ts` - `pnpm exec vitest run server/src/__tests__/plugin-routes-authz.test.ts` - `pnpm --filter @paperclipai/server typecheck` - `pnpm exec vitest run server/src/__tests__/heartbeat-process-recovery.test.ts -t "reuses the raced stranded recovery issue"` skipped locally because embedded Postgres did not initialize on this macOS temp host; the code path was typechecked and is covered by Linux CI. - Boundary check: no core references remain for `PAPERCLIP_SPACE_ID`, `spaces migrate-default`, `@paperclipai/shared/space-paths`, `registerSpacesCommands`, or the removed bookmarks example. - Previous PR head `4f23e034` had green GitHub checks: `verify`, all four serialized server shards, `e2e`, `Canary Dry Run`, `policy`, Snyk, and `Greptile Review`. Current head `582f466d` is re-running checks after the bookmarks deletion. ## Risks - Plugin host changes touch shared runtime paths, so regressions would most likely appear in adapter startup, plugin loading, or local dev path defaults. - Removing the bookmarks example also removes one demonstration of plugin database namespaces plus local-folder persistence; remaining plugin examples still cover bundled example discovery and plugin host flows. - The plugin package itself is intentionally deferred to the stacked plugin-only PR, where LLM Wiki plugin-local spaces live. - Existing installs that tested the transient root-level spaces CLI should stop using it; this PR intentionally removes that unsupported migration surface before merge. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI GPT-5 Codex via Codex CLI, tool use and local code execution enabled; context window not exposed. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass, except where noted above for host-specific embedded Postgres initialization - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge Stacked follow-up: PR #5592 contains only `packages/plugins/plugin-llm-wiki/` and targets this branch. --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
This commit is contained in:
parent
eb12c42009
commit
0096b56a1c
40 changed files with 1892 additions and 224 deletions
23
doc/CLI.md
23
doc/CLI.md
|
|
@ -204,7 +204,28 @@ pnpm paperclipai heartbeat run --agent-id <agent-id> [--api-base http://localhos
|
|||
|
||||
## Local Storage Defaults
|
||||
|
||||
Default local instance root is `~/.paperclip/instances/default`:
|
||||
Local Paperclip data lives under the selected instance root. `PAPERCLIP_HOME` chooses the home directory and `PAPERCLIP_INSTANCE_ID` chooses the instance.
|
||||
|
||||
```text
|
||||
~/.paperclip/ # PAPERCLIP_HOME
|
||||
└── instances/
|
||||
└── default/ # instance root (PAPERCLIP_INSTANCE_ID)
|
||||
├── config.json # runtime config
|
||||
├── .env # instance env file
|
||||
├── db/ # embedded PostgreSQL data
|
||||
├── data/
|
||||
│ ├── storage/ # local_disk uploads
|
||||
│ └── backups/ # automatic DB backups
|
||||
├── logs/
|
||||
├── secrets/
|
||||
│ └── master.key # local_encrypted master key
|
||||
├── workspaces/ # default agent workspaces
|
||||
├── projects/ # project execution workspaces
|
||||
├── companies/ # per-company adapter homes (e.g. codex-home)
|
||||
└── codex-home/ # per-instance codex home (when not company-scoped)
|
||||
```
|
||||
|
||||
Default paths for the canonical install:
|
||||
|
||||
- config: `~/.paperclip/instances/default/config.json`
|
||||
- embedded db: `~/.paperclip/instances/default/db`
|
||||
|
|
|
|||
|
|
@ -157,6 +157,27 @@ See `doc/DOCKER.md` for API key wiring (`OPENAI_API_KEY` / `ANTHROPIC_API_KEY`)
|
|||
|
||||
For a separate review-oriented container that keeps `codex`/`claude` login state in Docker volumes and checks out PRs into an isolated scratch workspace, see `doc/UNTRUSTED-PR-REVIEW.md`.
|
||||
|
||||
## Local Instance Layout
|
||||
|
||||
Every local install keeps runtime state directly under the selected instance root:
|
||||
|
||||
```text
|
||||
~/.paperclip/instances/default/ # instance root
|
||||
config.json # runtime config
|
||||
.env # instance env file
|
||||
db/ # embedded PostgreSQL data
|
||||
data/
|
||||
storage/ # local_disk uploads
|
||||
backups/ # automatic DB backups
|
||||
logs/
|
||||
secrets/master.key # local_encrypted master key
|
||||
workspaces/<agent-id>/ # default agent workspaces
|
||||
projects/ # project execution workspaces
|
||||
companies/<company-id>/codex-home/ # per-company codex_local home
|
||||
```
|
||||
|
||||
`PAPERCLIP_HOME` and `PAPERCLIP_INSTANCE_ID` override the home root and instance id respectively. `paperclipai onboard` echoes the resolved values in its banner (`Local home: <home> | instance: <id> | config: <path>`) so you can confirm where state will land before continuing.
|
||||
|
||||
## Database in Dev (Auto-Handled)
|
||||
|
||||
For local development, leave `DATABASE_URL` unset.
|
||||
|
|
@ -164,7 +185,7 @@ The server will automatically use embedded PostgreSQL and persist data at:
|
|||
|
||||
- `~/.paperclip/instances/default/db`
|
||||
|
||||
Override home and instance:
|
||||
Override home or instance:
|
||||
|
||||
```sh
|
||||
PAPERCLIP_HOME=/custom/path PAPERCLIP_INSTANCE_ID=dev pnpm paperclipai run
|
||||
|
|
@ -280,7 +301,7 @@ paperclipai worktree init --from-data-dir ~/.paperclip
|
|||
paperclipai worktree init --force
|
||||
```
|
||||
|
||||
Repair an already-created repo-managed worktree and reseed its isolated instance from the main default install:
|
||||
Repair an already-created repo-managed worktree and reseed its isolated instance from the main default install. Point `--from-config` at the instance config:
|
||||
|
||||
```sh
|
||||
cd /path/to/paperclip/.paperclip/worktrees/PAP-884-ai-commits-component
|
||||
|
|
|
|||
135
doc/plans/2026-05-06-llm-wiki-paperclip-asset-security-gate.md
Normal file
135
doc/plans/2026-05-06-llm-wiki-paperclip-asset-security-gate.md
Normal file
|
|
@ -0,0 +1,135 @@
|
|||
# LLM Wiki Paperclip Asset And Work-Product Security Gate
|
||||
|
||||
Status: accepted Phase 5 policy
|
||||
Date: 2026-05-06
|
||||
Owner: Security engineering
|
||||
Scope: Paperclip-derived ingestion into the LLM Wiki before any asset or work-product content indexing ships
|
||||
|
||||
## Decision
|
||||
|
||||
Phase 5 remains **fail-closed** for Paperclip assets and work products.
|
||||
|
||||
- Paperclip-derived **text extraction is allowed only** for issue titles/descriptions, issue comments, and issue documents.
|
||||
- Paperclip **assets/attachments** and **issue work products** are **metadata-only** in Phase 5.
|
||||
- **Linked summaries** and **content extraction** for assets/work products are **not approved** in Phase 5.
|
||||
- No implementation may fetch `/api/assets/:id/content`, dereference a work-product `url`, scrape preview pages, or embed binary/blob content into source bundles or source snapshots.
|
||||
|
||||
This keeps the secure path easier than the insecure one and avoids broadening the wiki into a second content-distribution channel.
|
||||
|
||||
## Allowed Source Kinds
|
||||
|
||||
These source kinds may contribute body text to Paperclip-derived source bundles:
|
||||
|
||||
| Source kind | Allowed body fields | Reason |
|
||||
| --- | --- | --- |
|
||||
| Issue | `title`, `description`, identifier/status metadata | First-party Paperclip text under company ACL |
|
||||
| Comment | `body` | First-party Paperclip text under company ACL |
|
||||
| Document | `body`, `title`, `key`, revision metadata | First-party Paperclip text under company ACL |
|
||||
|
||||
## Assets And Work Products
|
||||
|
||||
### Assets / attachments
|
||||
|
||||
Allowed in Phase 5:
|
||||
|
||||
- metadata-only references built from allowlisted structured fields already stored in Paperclip
|
||||
- recommended fields: `issueId`, `issueCommentId`, `attachmentId`, `assetId`, `originalFilename`, `contentType`, `byteSize`, `sha256`, `createdAt`, `createdByAgentId`, `createdByUserId`
|
||||
|
||||
Disallowed in Phase 5:
|
||||
|
||||
- fetching asset bytes from `/api/assets/:id/content`
|
||||
- parsing any blob body, including `text/plain`, `text/markdown`, `application/json`, images, SVG, PDFs, archives, or office formats
|
||||
- storing `contentPath` in wiki source bundles or source snapshots
|
||||
- model summarization of attachment bodies
|
||||
|
||||
### Work products
|
||||
|
||||
Allowed in Phase 5:
|
||||
|
||||
- metadata-only references built from allowlisted structured fields already stored in Paperclip
|
||||
- recommended fields: `issueId`, `workProductId`, `type`, `provider`, `title`, `status`, `reviewState`, `healthStatus`, `externalId`, `isPrimary`, `createdAt`, `updatedAt`
|
||||
- optional boolean/derived metadata such as `hasUrl: true`
|
||||
|
||||
Disallowed in Phase 5:
|
||||
|
||||
- fetching or crawling the work-product `url`
|
||||
- scraping preview pages, artifacts, pull requests, branches, commits, or custom provider targets through the wiki ingestion path
|
||||
- storing raw `url` values in wiki source bundles or source snapshots
|
||||
- model-authored linked summaries derived from off-record content
|
||||
|
||||
## MIME Allowlists And Size Caps
|
||||
|
||||
No MIME allowlist is approved for asset content extraction in Phase 5 because **no asset body extraction is approved at all**.
|
||||
|
||||
- Every asset MIME type is treated as opaque for Paperclip-derived indexing.
|
||||
- Existing upload limits remain storage concerns, not ingestion approvals.
|
||||
- Work-product destinations are also opaque regardless of MIME type or size.
|
||||
|
||||
Any future issue that wants blob parsing must define:
|
||||
|
||||
- a positive MIME allowlist
|
||||
- per-type parser strategy
|
||||
- per-source size caps
|
||||
- sandbox/isolation requirements
|
||||
- prompt-injection handling
|
||||
- regression tests for refusal paths
|
||||
|
||||
## Redaction Rules
|
||||
|
||||
Metadata-only means **structured facts only**, not capability-bearing links.
|
||||
|
||||
- Do not persist `contentPath` for assets.
|
||||
- Do not persist raw work-product `url` values.
|
||||
- Do not persist query strings, fragments, signed URL tokens, or userinfo.
|
||||
- Prefer stable identifiers (`assetId`, `workProductId`, `externalId`) over links.
|
||||
|
||||
This addresses Sensitive Information Disclosure, Unsafe Consumption of APIs, and Insecure Output Handling risks.
|
||||
|
||||
## Provenance Rules
|
||||
|
||||
Every metadata-only reference must preserve enough provenance to explain where it came from without reading the underlying content:
|
||||
|
||||
- `companyId`
|
||||
- `issueId`
|
||||
- attachment/work-product id
|
||||
- producer identity when available
|
||||
- timestamps
|
||||
- an explicit `metadata_only` marker in any future reference/snapshot schema
|
||||
|
||||
## Review-Required Behavior
|
||||
|
||||
Human review is **not** required for plain metadata-only references that stay inside the allowlisted fields above.
|
||||
|
||||
Human review **is required**, with a separate security sign-off issue, before enabling any of the following:
|
||||
|
||||
- asset body extraction
|
||||
- work-product URL fetching
|
||||
- linked summaries generated from asset/work-product content
|
||||
- storing raw blob links or raw remote URLs in wiki source material
|
||||
- non-default-space routing for Paperclip-derived asset/work-product references
|
||||
|
||||
## Security Rationale
|
||||
|
||||
This gate exists because the current host surfaces have different trust properties:
|
||||
|
||||
- issue/comment/document text is first-party Paperclip content already exposed through company-scoped issue/document APIs
|
||||
- asset content is a blob download surface (`/api/assets/:id/content`) and can carry prompt-injection or parser-risk payloads
|
||||
- work products can point at arbitrary destinations through `url`, which reintroduces SSRF, token leakage, and prompt-injection risk if dereferenced automatically
|
||||
|
||||
Relevant threat classes:
|
||||
|
||||
- OWASP LLM Top 10: Prompt Injection, Sensitive Information Disclosure, Insecure Output Handling, Excessive Agency
|
||||
- OWASP API Top 10: SSRF, Unsafe Consumption of APIs, Broken Object Property Level Authorization
|
||||
- Saltzer & Schroeder: Least Privilege, Fail Securely, Complete Mediation, Secure Defaults
|
||||
|
||||
## Follow-Up Implementation Scope
|
||||
|
||||
A follow-up implementation issue is justified only for **metadata-only references**.
|
||||
|
||||
That implementation must:
|
||||
|
||||
- keep assets/work products out of source-bundle body text
|
||||
- never fetch blob bytes or remote URLs
|
||||
- redact capability-bearing link fields
|
||||
- mark references as `metadata_only`
|
||||
- ship tests proving source bundles/snapshots never contain `contentPath` or raw work-product `url` fields
|
||||
Loading…
Add table
Add a link
Reference in a new issue