[codex] Add skills CLI and catalog management (#6782)

## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies through company-scoped control-plane workflows. > - Agents need reusable, inspectable skills that can be installed, reset, audited, exported, and assigned without bespoke local setup. > - The existing skill truth model needed cleanup so bundled skills, optional catalog skills, runtime skills, and adapter-provided skills have clear provenance. > - Operators also need a practical CLI and board UI for discovering and managing company skills. > - This pull request adds the skills CLI, packaged skills catalog, company skills APIs, and catalog-aware board UI. > - The benefit is a more reusable Paperclip company setup where skills are portable, auditable, and easier for operators and agents to manage. ## What Changed - Added `paperclipai skills` CLI commands and coverage for catalog listing, installing, resetting, and inspecting company skills. - Added a packaged `@paperclipai/skills-catalog` workspace with bundled and optional skill content plus validation/build tests. - Added shared company-skill types and validators used across CLI, server, and UI contracts. - Added server catalog APIs/services for company skill catalog operations, reset semantics, audit behavior, and portability provenance. - Updated adapter skill handling so runtime/catalog provenance remains explicit across local adapters. - Added board UI support for browsing and managing catalog-backed company skills. - Updated docs for the skills CLI/catalog flow and the company skills Paperclip skill reference. - Rebased the branch onto current `paperclipai/paperclip:master`; no `pnpm-lock.yaml`, `.github/workflows`, or migration files are included in the final PR diff. ## Verification - Passed: `pnpm run preflight:workspace-links && pnpm exec vitest run cli/src/__tests__/skills.test.ts packages/skills-catalog/src/catalog-builder.test.ts packages/skills-catalog/src/shipped-catalog.test.ts packages/shared/src/validators/company-skill.test.ts packages/adapter-utils/src/server-utils.test.ts packages/plugins/create-paperclip-plugin/src/entrypoints.test.ts server/src/__tests__/company-skills-catalog-service.test.ts server/src/__tests__/company-skills-routes.test.ts server/src/__tests__/company-portability.test.ts`. - Passed: `pnpm exec vitest run server/src/__tests__/workspace-runtime.test.ts -t "default branch|origin/master|symbolic-ref"`. - Attempted: full `server/src/__tests__/workspace-runtime.test.ts`. Four provisioning tests failed while seeding an isolated worktree database from the local Paperclip instance because the local plugin schema dump contains a duplicate-column foreign key (`plugin_content_machine_18a7bc327b.content_case_signals`). The default-branch tests touched by the rebase conflict passed in the focused run above. - Checked final diff: no `pnpm-lock.yaml`, no `.github/workflows`, and no migration-file changes relative to `master`. ## Risks - Medium: this is a broad skills/catalog change touching CLI, server APIs, shared contracts, adapter skill sync, and UI. - Catalog validation and reset semantics need careful reviewer attention because they affect reusable company setup and portability. - No database migrations are included in this PR, so there is no migration ordering/idempotency risk in the final diff. - No lockfile is included by design; dependency resolution will be handled by the repository lockfile workflow. ## Model Used - OpenAI Codex coding agent based on GPT-5, running in Paperclip via the `codex_local` adapter with shell, git, GitHub CLI, and code-editing tool access. Exact hosted model build/context-window metadata is not exposed in this runtime. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run targeted tests locally and documented the local workspace-runtime seed failure above - [x] I have added or updated tests where applicable - [x] If this change affects the UI, screenshots were intentionally omitted per PAP-10124 instructions; UI behavior is covered by tests and reviewer inspection - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-06-14 01:50:39 +09:00 · 2026-05-28 07:33:51 -10:00 · 2026-05-28 07:33:51 -10:00 · 9eac727cf1
commit 9eac727cf1
parent 8da50dbcf8
77 changed files with 9704 additions and 530 deletions
--- a/packages/skills-catalog/catalog/bundled/quality/qa-acceptance/SKILL.md
+++ b/packages/skills-catalog/catalog/bundled/quality/qa-acceptance/SKILL.md
@ -0,0 +1,93 @@
+---
+name: qa-acceptance
+description: Produce QA acceptance criteria and a manual validation plan for a feature change — golden path, edge cases, error states, performance limits, and explicit pass/fail evidence.
+key: paperclipai/bundled/quality/qa-acceptance
+recommendedForRoles:
+  - qa
+  - engineer
+  - product
+tags:
+  - qa
+  - acceptance
+  - validation
+  - testing
+---
+
+# QA Acceptance
+
+Write acceptance criteria that a reviewer can run against the running app and decide pass or fail without asking the author. The criteria are the contract — automated tests cover correctness, QA covers feature-level behavior.
+
+## When to use
+
+- A feature change is heading to QA and needs a written validation plan.
+- A reviewer is asked to verify a PR that touches user-visible behavior.
+- An incident postmortem requires a regression check before reopen-prevention.
+- A release candidate needs a pre-cut smoke pass.
+
+## When not to use
+
+- The change is unit-test-only (utility refactor, internal naming). Acceptance criteria are unnecessary churn.
+- You are asked to write tests against API contracts. Use contract testing, not feature QA.
+
+## Acceptance criteria format
+
+Each criterion is a single, independently-verifiable statement:
+
+```md
+- **Given** <starting state>, **when** <action>, **then** <observable outcome>.
+```
+
+Example:
+
+```md
+- **Given** a CSV export with 0 rows, **when** the user clicks Export, **then** the file downloads with only the header row and the UI shows "Exported 0 rows".
+```
+
+Avoid criteria that combine multiple `when`s or `then`s. Split them.
+
+## What every plan must cover
+
+1. **Golden path.** The most common successful flow, end to end.
+2. **Empty and minimum states.** Zero items, one item, missing optional inputs.
+3. **Boundary inputs.** Max length strings, max numeric values, unicode, RTL text where applicable.
+4. **Error states.** Network failure, permission denied, validation failures, conflict (409), not found (404).
+5. **Concurrency and ordering.** Two users acting at once, race against background jobs, refresh during mutation.
+6. **Performance envelope.** The largest realistic input the change must handle without UI hangs or timeouts.
+7. **Backward compatibility.** Existing data, existing URLs, persisted user preferences continue to work.
+8. **Telemetry and audit.** Events, logs, or activity entries the change is supposed to emit.
+
+If a section is genuinely not applicable, write "N/A: <why>" — do not silently omit.
+
+## Evidence
+
+Each criterion needs evidence on the verification pass:
+
+- Screenshot or short clip for UI behavior.
+- Copied console / network output for API behavior.
+- Log snippet or activity row for telemetry.
+- Timing measurement for performance criteria.
+
+"Looks good to me" without evidence is not a pass.
+
+## Quarantine and follow-up
+
+- A failing criterion blocks acceptance unless explicitly waived by the owner with a tracked follow-up issue.
+- "Known issue" without a linked follow-up is not a waiver.
+- If you add a new criterion mid-pass, restart the pass — partial coverage hides regressions.
+
+## Handoff back to the author
+
+Return the validation plan with three sections:
+
+- **Pass.** Criteria that passed, with one-line evidence summaries.
+- **Fail.** Criteria that failed, with the exact reproduction.
+- **Blocked.** Criteria you could not run, with why.
+
+The author owns turning failures into either fixes or accepted deferrals.
+
+## Anti-patterns
+
+- Acceptance phrased as test plan ("write a Cypress test for X"). Acceptance is what is true after the change ships; tests are how you check.
+- Criteria that depend on inspecting implementation details (selectors, query plans). Stay observable.
+- Long checklists with no priority. Mark must-pass criteria distinctly from nice-to-have.
+- Validation reports that say "passed" with no evidence. Reviewers cannot audit those.