[codex] Add skills CLI and catalog management (#6782)

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies through
company-scoped control-plane workflows.
> - Agents need reusable, inspectable skills that can be installed,
reset, audited, exported, and assigned without bespoke local setup.
> - The existing skill truth model needed cleanup so bundled skills,
optional catalog skills, runtime skills, and adapter-provided skills
have clear provenance.
> - Operators also need a practical CLI and board UI for discovering and
managing company skills.
> - This pull request adds the skills CLI, packaged skills catalog,
company skills APIs, and catalog-aware board UI.
> - The benefit is a more reusable Paperclip company setup where skills
are portable, auditable, and easier for operators and agents to manage.

## What Changed

- Added `paperclipai skills` CLI commands and coverage for catalog
listing, installing, resetting, and inspecting company skills.
- Added a packaged `@paperclipai/skills-catalog` workspace with bundled
and optional skill content plus validation/build tests.
- Added shared company-skill types and validators used across CLI,
server, and UI contracts.
- Added server catalog APIs/services for company skill catalog
operations, reset semantics, audit behavior, and portability provenance.
- Updated adapter skill handling so runtime/catalog provenance remains
explicit across local adapters.
- Added board UI support for browsing and managing catalog-backed
company skills.
- Updated docs for the skills CLI/catalog flow and the company skills
Paperclip skill reference.
- Rebased the branch onto current `paperclipai/paperclip:master`; no
`pnpm-lock.yaml`, `.github/workflows`, or migration files are included
in the final PR diff.

## Verification

- Passed: `pnpm run preflight:workspace-links && pnpm exec vitest run
cli/src/__tests__/skills.test.ts
packages/skills-catalog/src/catalog-builder.test.ts
packages/skills-catalog/src/shipped-catalog.test.ts
packages/shared/src/validators/company-skill.test.ts
packages/adapter-utils/src/server-utils.test.ts
packages/plugins/create-paperclip-plugin/src/entrypoints.test.ts
server/src/__tests__/company-skills-catalog-service.test.ts
server/src/__tests__/company-skills-routes.test.ts
server/src/__tests__/company-portability.test.ts`.
- Passed: `pnpm exec vitest run
server/src/__tests__/workspace-runtime.test.ts -t "default
branch|origin/master|symbolic-ref"`.
- Attempted: full `server/src/__tests__/workspace-runtime.test.ts`. Four
provisioning tests failed while seeding an isolated worktree database
from the local Paperclip instance because the local plugin schema dump
contains a duplicate-column foreign key
(`plugin_content_machine_18a7bc327b.content_case_signals`). The
default-branch tests touched by the rebase conflict passed in the
focused run above.
- Checked final diff: no `pnpm-lock.yaml`, no `.github/workflows`, and
no migration-file changes relative to `master`.

## Risks

- Medium: this is a broad skills/catalog change touching CLI, server
APIs, shared contracts, adapter skill sync, and UI.
- Catalog validation and reset semantics need careful reviewer attention
because they affect reusable company setup and portability.
- No database migrations are included in this PR, so there is no
migration ordering/idempotency risk in the final diff.
- No lockfile is included by design; dependency resolution will be
handled by the repository lockfile workflow.

## Model Used

- OpenAI Codex coding agent based on GPT-5, running in Paperclip via the
`codex_local` adapter with shell, git, GitHub CLI, and code-editing tool
access. Exact hosted model build/context-window metadata is not exposed
in this runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run targeted tests locally and documented the local
workspace-runtime seed failure above
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, screenshots were intentionally
omitted per PAP-10124 instructions; UI behavior is covered by tests and
reviewer inspection
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
This commit is contained in:
Dotta 2026-05-28 07:33:51 -10:00 committed by GitHub
parent 8da50dbcf8
commit 9eac727cf1
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
77 changed files with 9704 additions and 530 deletions

View file

@ -0,0 +1,93 @@
---
name: agent-browser
description: Drive a real browser to inspect or interact with a web page or app — navigate, take screenshots, read console and network, fill simple forms — for verification tasks, not unattended automation.
key: paperclipai/optional/browser/agent-browser
recommendedForRoles:
- qa
- engineer
- researcher
tags:
- browser
- puppeteer
- playwright
- verification
---
# Agent Browser
Use a controlled browser to verify behavior, capture evidence, or extract information from web pages that a static fetch cannot reach (SPAs, login-gated pages, dynamic content). This skill is about supervised verification, not unattended scraping.
## When to use
- You need a screenshot of a deployed page or a local dev server to confirm a UI change.
- You need to read JavaScript-rendered content that `curl`/`wget` will not see.
- A user reports a UI bug and you need to reproduce it interactively to capture console errors, network requests, or layout state.
- You need to walk through a short flow (load page, click, observe) to verify acceptance criteria.
## When not to use
- The page is reachable as static HTML. Use `curl`/HTTP fetch — it is cheaper, faster, and more reliable.
- The task is unattended large-scale scraping. That belongs to a dedicated scraper with rate limits, robots.txt handling, and a real user agent policy — not this skill.
- The site is behind authentication you do not own credentials for, or whose terms of service prohibit automation.
- The site involves sensitive accounts (banking, healthcare, government) where automation risks lockout or compliance issues.
## Before launching the browser
- Confirm the URL and what state should be true after navigation.
- Decide what evidence is needed: full-page screenshot, viewport screenshot, console log, network trace, HTML snapshot, extracted text.
- Decide the viewport size that matters for the task (mobile vs desktop). Default to a desktop size unless the task is mobile-specific.
- For local dev servers, confirm the server is running and the port is what you expect.
## Driving the browser
A typical verification session:
1. **Launch with a real-looking user agent** when the target is the public internet; an unrealistic UA flags automation traffic.
2. **Set a sane viewport** (e.g., 1366×768 desktop, 390×844 iPhone-ish).
3. **Navigate and wait for the right signal.** Prefer waiting for a specific selector or network-idle over arbitrary sleeps.
4. **Capture evidence immediately** after the wait condition succeeds, before any interaction perturbs the state.
5. **Interact deliberately.** One click at a time, with a wait between actions; re-screenshot after each meaningful state change.
6. **Read the console and network panels** for unexpected errors, 4xx/5xx responses, or slow requests.
7. **Close the browser cleanly** when done. Long-running browser sessions leak memory and hold ports.
## What evidence to record
For a verification task, deliver:
- A full-page or viewport screenshot of each meaningful state.
- The console log, filtered to warnings/errors.
- Any non-2xx network response with the URL, status, and a short response body excerpt.
- A short narration: "Navigated to X, observed Y, clicked Z, observed W."
For a UI bug repro, also record:
- The exact reproduction steps the user can follow.
- Viewport size and (where relevant) device pixel ratio.
- Whether the bug reproduces on first load vs after interaction.
## Login-gated pages
- Prefer programmatic auth (API token, magic link) over UI login.
- If UI login is the only path, the user must provide credentials explicitly for this run. Never reuse credentials outside the session.
- Do not store credentials in the session log, screenshot, or returned output.
## Performance and politeness
- Throttle to one navigation per few seconds when touching shared infra.
- Respect `robots.txt` for public sites you are inspecting at any volume.
- Cancel navigations if a page exceeds a reasonable timeout (e.g., 30s); the page is broken or rate-limiting you.
- Do not retry forever on failure. Retry once with a longer timeout, then escalate.
## Common failure modes
- **Selector not found.** Page changed, or you are waiting before render. Take a screenshot to see actual state; adjust the selector.
- **Click does nothing.** The element is offscreen, covered by a modal, or in a shadow DOM. Scroll into view or pierce the shadow root.
- **Headless detection.** Some sites detect headless Chrome and serve a different page. Use a non-headless mode or a fingerprint-realistic configuration only when authorized.
- **Cross-origin iframe blocking.** Iframes you do not own cannot be inspected; the page must offer the data outside the iframe or the task is infeasible.
## Anti-patterns
- Long unsupervised browser sessions that drift from the original task.
- Scraping behind authentication you do not own.
- Captioning a screenshot with "looks good" without saying what state was loaded and what selectors confirmed it.
- Treating a passing screenshot as proof of correctness across viewports you did not actually test.

View file

@ -0,0 +1,128 @@
---
name: release-announcement
description: Write a release announcement — changelog, blog post, in-app note, or social post — that leads with user impact, names the audience, and includes upgrade/migration steps without filler.
key: paperclipai/optional/content/release-announcement
recommendedForRoles:
- devrel
- product
- writer
tags:
- release
- changelog
- announcement
- communication
---
# Release Announcement
Write the channel-appropriate announcement for a release without churn. Different surfaces need different shapes: a changelog entry is not a blog post is not a social card. The bar is: a reader of the chosen surface can decide in under 30 seconds whether this release affects them, and if so what to do.
## When to use
- A version, feature, or fix is shipping and needs writeup for at least one surface.
- A previously private feature is going GA.
- A breaking change needs broadcast before users hit it.
## When not to use
- An internal-only change with no user impact. Update internal docs; do not announce.
- The release is incomplete (still in active development). Wait until it ships, even if marketing wants the post.
## Determine the audience and channel first
| Audience | Best channel | Tone |
|---|---|---|
| Existing power users | Changelog, in-app note | Terse, factual, links |
| Engineering teams adopting your API | Release notes, dev blog | Examples, migration steps, version pins |
| Prospective customers | Landing page, marketing blog | Story arc, problem → solution, social proof |
| Broad audience | Social post, email newsletter | One-sentence pitch, link to depth |
| Internal team | Slack/Discord post | What changed, who to ping if it breaks |
Pick the audience for *this* writeup. One release often needs several writeups; do not blend them.
## Universal structure
Whatever the channel, lead with:
1. **What changed.** One sentence in the user's vocabulary.
2. **Who it affects.** Which user role / use case.
3. **What to do.** Migrate now / opt-in / no action needed.
Everything else is depth that supports those three.
## Channel templates
### Changelog entry (terse)
```md
## v1.42.0 — 2026-05-26
### Added
- <feature><one-line user benefit>. ([#1234](link))
### Changed
- <change><one-line impact>. ([#1235](link))
### Fixed
- <bug><one-line user-visible symptom>. ([#1236](link))
### Deprecated
- <thing>. Replaced by <thing>. Removal planned for v<x>.
### Breaking
- <change>. **Migration:** <one-line> or <link to guide>.
```
### Release notes (for adopters)
Same as changelog, plus:
- Migration guide section with before/after code.
- Compatibility table (versions, runtimes, OS).
- Known issues and workarounds.
- Acknowledgements (contributors, reporters of fixed bugs).
### Dev blog post (300800 words)
- **Hook (1 paragraph):** the problem the release solves, in a real-world scenario.
- **What's new (35 bullets with sub-paragraphs):** features, with one code or screenshot example each.
- **Upgrade (1 paragraph):** how to upgrade, what to check.
- **What's next:** one sentence about the next direction. Avoid promises.
### In-app note
- 1 sentence.
- 1 link.
- Dismiss after seen.
### Social post
- 1 sentence pitch.
- 1 link.
- 1 image or short clip.
- No threadbait. If it needs a thread, write a blog post instead.
## Writing rules
- Lead with the user, not the team. `You can now export to CSV` beats `We've added CSV export`.
- Numbers beat adjectives. `60% faster cold start` beats `much faster`. Cite the methodology.
- Show, don't just tell. One code snippet, one screenshot — more is noise.
- Date the post. Undated release content rots fastest.
- Link the migration path explicitly. Do not bury it.
- Mark breaking changes with `**Breaking:**` prefix. Repeat in the email/social channel.
## Avoid
- "We are excited to announce" filler.
- Lists of changes that mix user-visible and internal items.
- Marketing claims without a way to verify.
- Promised dates for unshipped work.
- Pre-announcing something the team has not yet committed to ship.
## Post-publish checklist
- Changelog is in source control alongside the release.
- Blog post date matches actual ship date.
- All links work (release tag, PRs, docs sections).
- Breaking changes are also in the upgrade guide, not only the post.
- Internal team is notified before the public post goes live, not after.

View file

@ -0,0 +1,121 @@
---
name: design-critique
description: Give a structured product design critique — user job clarity, hierarchy, affordance, error states, accessibility, and consistency — focused on what to change, in what order, and why.
key: paperclipai/optional/product/design-critique
recommendedForRoles:
- designer
- product
- engineer
tags:
- design
- product
- ux
- review
---
# Product Design Critique
A structured critique pass for a screen, flow, or component. The output is a prioritized list of changes a designer or engineer can act on — not adjectives. Critique is not redesign; recommend, do not rebuild.
## When to use
- A designer or engineer asks for feedback on a screen, mock, or live UI.
- A feature is shipping and someone wants a final UX read.
- A flow is suspected of causing user drop-off and you want a pre-research read before instrumentation.
## When not to use
- The user wants a redesign. That is a design project, not a critique.
- The work is so early that no concrete artifact exists. Sketch with them instead of critiquing air.
- You have no context on the user job. Ask for it first; design critique without user context devolves into taste.
## Pre-critique context
Before opening a screen, get:
- **Who is the user.** Specific role and competence, not "users".
- **What job they are doing on this screen.** One sentence.
- **What success looks like.** What the user can do after this screen that they could not before.
- **Where this screen sits in the larger flow.** What precedes and follows.
If any of these is missing, ask. Critique without these is opinion.
## The pass (in order)
1. **Clarity of the user job.**
- Within 3 seconds of opening, is it obvious what this screen is for?
- Does the primary action match the user's actual job, or a designer's preferred path?
2. **Visual hierarchy.**
- The most important thing on the screen should be the most prominent (size, weight, position, color).
- Secondary actions should look secondary. Tertiary should be findable but not loud.
- Headings should chunk content into the right groups for the task.
3. **Affordance and signifiers.**
- Clickable things look clickable.
- Disabled things look disabled and explain why on hover/focus.
- Drag, scroll, or swipe interactions are discoverable, not hidden.
4. **States.**
- Empty state (no data) is designed, not a blank rectangle.
- Loading state communicates progress, not just spins.
- Error states say what went wrong and what to do next, in the user's words.
- Success state confirms without celebrating banal actions.
5. **Inputs and forms.**
- Labels visible, not just placeholders.
- Validation runs at the right time (on blur, not on every keystroke unless the user is in a known-format field).
- Required fields marked.
- Field order matches the user's mental order, not the database order.
6. **Accessibility.**
- Sufficient color contrast (WCAG AA at minimum; AAA where reasonable).
- Focus order is logical for keyboard navigation.
- Interactive elements are reachable without a mouse.
- Critical information is not color-only (icons, text, position back it up).
- Touch targets at least 44×44 px on mobile.
7. **Consistency.**
- Tokens, components, and patterns match the rest of the product.
- "Borrowed" patterns from other products are intentional, not accidental drift.
8. **Copy.**
- Buttons are verbs that name the outcome ("Save changes" beats "Submit").
- Microcopy explains, does not decorate.
- Tone matches the product voice.
9. **Edge cases.**
- Long content (long names, many items, RTL languages).
- Tiny content (one item, zero items).
- Slow network and offline behavior.
- Permissions denied.
## Output format
Group findings by severity, then by category. Each finding is one issue and one suggested fix.
```md
## Design critique: <screen name>
### Must-fix (blocks ship)
- **<category>:** <one-line issue>. **Try:** <one-line suggestion>.
### Should-fix (before broader rollout)
- **<category>:** <one-line issue>. **Try:** <one-line suggestion>.
### Nice-to-fix (when there's room)
- **<category>:** <one-line issue>. **Try:** <one-line suggestion>.
### Strengths to keep
- <one-line thing the design got right>
```
Always include the "strengths to keep" section. It is not flattery — it is signal to the designer about what not to change in the next round.
## Anti-patterns
- "I would do it differently" without saying what or why. That is preference, not critique.
- Long critiques that bury must-fix items under nice-to-haves.
- Suggesting net-new features under the guise of a critique.
- Ignoring user context and grading on taste.
- Treating a critique as approval. State approval explicitly if asked; otherwise critique is feedback, not sign-off.