Merge public-gh/master into paperclip-company-import-export

This commit is contained in:
dotta 2026-03-18 09:57:26 -05:00
commit 9e19f1d005
49 changed files with 3997 additions and 2501 deletions

View file

@ -0,0 +1,424 @@
# Docker Release Browser E2E Plan
## Context
Today release smoke testing for published Paperclip packages is manual and shell-driven:
```sh
HOST_PORT=3232 DATA_DIR=./data/release-smoke-canary PAPERCLIPAI_VERSION=canary ./scripts/docker-onboard-smoke.sh
HOST_PORT=3233 DATA_DIR=./data/release-smoke-stable PAPERCLIPAI_VERSION=latest ./scripts/docker-onboard-smoke.sh
```
That is useful because it exercises the same public install surface users hit:
- Docker
- `npx paperclipai@canary`
- `npx paperclipai@latest`
- authenticated bootstrap flow
But it still leaves the most important release questions to a human with a browser:
- can I sign in with the smoke credentials?
- do I land in onboarding?
- can I complete onboarding?
- does the initial CEO agent actually get created and run?
The repo already has two adjacent pieces:
- `tests/e2e/onboarding.spec.ts` covers the onboarding wizard against the local source tree
- `scripts/docker-onboard-smoke.sh` boots a published Docker install and auto-bootstraps authenticated mode, but only verifies the API/session layer
What is missing is one deterministic browser test that joins those two paths.
## Goal
Add a release-grade Docker-backed browser E2E that validates the published `canary` and `latest` installs end to end:
1. boot the published package in Docker
2. sign in with known smoke credentials
3. verify the user is routed into onboarding
4. complete onboarding in the browser
5. verify the first CEO agent exists
6. verify the initial CEO run was triggered and reached a terminal or active state
Then wire that test into GitHub Actions so release validation is no longer manual-only.
## Recommendation In One Sentence
Turn the current Docker smoke script into a machine-friendly test harness, add a dedicated Playwright release-smoke spec that drives the authenticated browser flow against published Docker installs, and run it in GitHub Actions for both `canary` and `latest`.
## What We Have Today
### Existing local browser coverage
`tests/e2e/onboarding.spec.ts` already proves the onboarding wizard can:
- create a company
- create a CEO agent
- create an initial issue
- optionally observe task progress
That is a good base, but it does not validate the public npm package, Docker path, authenticated login flow, or release dist-tags.
### Existing Docker smoke coverage
`scripts/docker-onboard-smoke.sh` already does useful setup work:
- builds `Dockerfile.onboard-smoke`
- runs `paperclipai@${PAPERCLIPAI_VERSION}` inside Docker
- waits for health
- signs up or signs in a smoke admin user
- generates and accepts the bootstrap CEO invite in authenticated mode
- verifies a board session and `/api/companies`
That means the hard bootstrap problem is mostly solved already. The main gap is that the script is human-oriented and never hands control to a browser test.
### Existing CI shape
The repo already has:
- `.github/workflows/e2e.yml` for manual Playwright runs against local source
- `.github/workflows/release.yml` for canary publish on `master` and manual stable promotion
So the right move is to extend the current test/release system, not create a parallel one.
## Product Decision
### 1. The release smoke should stay deterministic and token-free
The first version should not require OpenAI, Anthropic, or external agent credentials.
Use the onboarding flow with a deterministic adapter that can run on a stock GitHub runner and inside the published Docker install. The existing `process` adapter with a trivial command is the right base path for this release gate.
That keeps this test focused on:
- release packaging
- auth/bootstrap
- UI routing
- onboarding contract
- agent creation
- heartbeat invocation plumbing
Later we can add a second credentialed smoke lane for real model-backed agents.
### 2. Smoke credentials become an explicit test contract
The current defaults in `scripts/docker-onboard-smoke.sh` should be treated as stable test fixtures:
- email: `smoke-admin@paperclip.local`
- password: `paperclip-smoke-password`
The browser test should log in with those exact values unless overridden by env vars.
### 3. Published-package smoke and source-tree E2E stay separate
Keep two lanes:
- source-tree E2E for feature development
- published Docker release smoke for release confidence
They overlap on onboarding assertions, but they guard different failure classes.
## Proposed Design
## 1. Add a CI-friendly Docker smoke harness
Refactor `scripts/docker-onboard-smoke.sh` so it can run in two modes:
- interactive mode
- current behavior
- streams logs and waits in foreground for manual inspection
- CI mode
- starts the container
- waits for health and authenticated bootstrap
- prints machine-readable metadata
- exits while leaving the container running for Playwright
Recommended shape:
- keep `scripts/docker-onboard-smoke.sh` as the public entry point
- add a `SMOKE_DETACH=true` or `--detach` mode
- emit a JSON blob or `.env` file containing:
- `SMOKE_BASE_URL`
- `SMOKE_ADMIN_EMAIL`
- `SMOKE_ADMIN_PASSWORD`
- `SMOKE_CONTAINER_NAME`
- `SMOKE_DATA_DIR`
The workflow and Playwright tests can then consume the emitted metadata instead of scraping logs.
### Why this matters
The current script always tails logs and then blocks on `wait "$LOG_PID"`. That is convenient for manual smoke testing, but it is the wrong shape for CI orchestration.
## 2. Add a dedicated Playwright release-smoke spec
Create a second Playwright entry point specifically for published Docker installs, for example:
- `tests/release-smoke/playwright.config.ts`
- `tests/release-smoke/docker-auth-onboarding.spec.ts`
This suite should not use Playwright `webServer`, because the app server will already be running inside Docker.
### Browser scenario
The first release-smoke scenario should validate:
1. open `/`
2. unauthenticated user is redirected to `/auth`
3. sign in using the smoke credentials
4. authenticated user lands on onboarding when no companies exist
5. onboarding wizard appears with the expected step labels
6. create a company
7. create the first agent using `process`
8. create the initial issue
9. finish onboarding and open the created issue
10. verify via API:
- company exists
- CEO agent exists
- issue exists and is assigned to the CEO
11. verify the first heartbeat run was triggered:
- either by checking issue status changed from initial state, or
- by checking agent/runs API shows a run for the CEO, or
- both
The test should tolerate the run completing quickly. For this reason, the assertion should accept:
- `queued`
- `running`
- `succeeded`
and similarly for issue progression if the issue status changes before the assertion runs.
### Why a separate spec instead of reusing `tests/e2e/onboarding.spec.ts`
The local-source test and release-smoke test have different assumptions:
- different server lifecycle
- different auth path
- different deployment mode
- published npm package instead of local workspace code
Trying to force both through one spec will make both worse.
## 3. Add a release-smoke workflow in GitHub Actions
Add a workflow dedicated to this surface, ideally reusable:
- `.github/workflows/release-smoke.yml`
Recommended triggers:
- `workflow_dispatch`
- `workflow_call`
Recommended inputs:
- `paperclip_version`
- `canary` or `latest`
- `host_port`
- optional, default runner-safe port
- `artifact_name`
- optional for clearer uploads
### Job outline
1. checkout repo
2. install Node/pnpm
3. install Playwright browser dependencies
4. launch Docker smoke harness in detached mode with the chosen dist-tag
5. run the release-smoke Playwright suite against the returned base URL
6. always collect diagnostics:
- Playwright report
- screenshots
- trace
- `docker logs`
- harness metadata file
7. stop and remove container
### Why a reusable workflow
This lets us:
- run the smoke manually on demand
- call it from `release.yml`
- reuse the same job for both `canary` and `latest`
## 4. Integrate it into release automation incrementally
### Phase A: Manual workflow only
First ship the workflow as manual-only so the harness and test can be stabilized without blocking releases.
### Phase B: Run automatically after canary publish
After `publish_canary` succeeds in `.github/workflows/release.yml`, call the reusable release-smoke workflow with:
- `paperclip_version=canary`
This proves the just-published public canary really boots and onboards.
### Phase C: Run automatically after stable publish
After `publish_stable` succeeds, call the same workflow with:
- `paperclip_version=latest`
This gives us post-publish confirmation that the stable dist-tag is healthy.
### Important nuance
Testing `latest` from npm cannot happen before stable publish, because the package under test does not exist under `latest` yet. So the `latest` smoke is a post-publish verification, not a pre-publish gate.
If we later want a true pre-publish stable gate, that should be a separate source-ref or locally built package smoke job.
## 5. Make diagnostics first-class
This workflow is only valuable if failures are fast to debug.
Always capture:
- Playwright HTML report
- Playwright trace on failure
- final screenshot on failure
- full `docker logs` output
- emitted smoke metadata
- optional `curl /api/health` snapshot
Without that, the test will become a flaky black box and people will stop trusting it.
## Implementation Plan
## Phase 1: Harness refactor
Files:
- `scripts/docker-onboard-smoke.sh`
- optionally `scripts/lib/docker-onboard-smoke.sh` or similar helper
- `doc/DOCKER.md`
- `doc/RELEASING.md`
Tasks:
1. Add detached/CI mode to the Docker smoke script.
2. Make the script emit machine-readable connection metadata.
3. Keep the current interactive manual mode intact.
4. Add reliable cleanup commands for CI.
Acceptance:
- a script invocation can start the published Docker app, auto-bootstrap it, and return control to the caller with enough metadata for browser automation
## Phase 2: Browser release-smoke suite
Files:
- `tests/release-smoke/playwright.config.ts`
- `tests/release-smoke/docker-auth-onboarding.spec.ts`
- root `package.json`
Tasks:
1. Add a dedicated Playwright config for external server testing.
2. Implement login + onboarding + CEO creation flow.
3. Assert a CEO run was created or completed.
4. Add a root script such as:
- `test:release-smoke`
Acceptance:
- the suite passes locally against both:
- `PAPERCLIPAI_VERSION=canary`
- `PAPERCLIPAI_VERSION=latest`
## Phase 3: GitHub Actions workflow
Files:
- `.github/workflows/release-smoke.yml`
Tasks:
1. Add manual and reusable workflow entry points.
2. Install Chromium and runner dependencies.
3. Start Docker smoke in detached mode.
4. Run the release-smoke Playwright suite.
5. Upload diagnostics artifacts.
Acceptance:
- a maintainer can run the workflow manually for either `canary` or `latest`
## Phase 4: Release workflow integration
Files:
- `.github/workflows/release.yml`
- `doc/RELEASING.md`
Tasks:
1. Trigger release smoke automatically after canary publish.
2. Trigger release smoke automatically after stable publish.
3. Document expected behavior and failure handling.
Acceptance:
- canary releases automatically produce a published-package browser smoke result
- stable releases automatically produce a `latest` browser smoke result
## Phase 5: Future extension for real model-backed agent validation
Not part of the first implementation, but this should be the next layer after the deterministic lane is stable.
Possible additions:
- a second Playwright project gated on repo secrets
- real `claude_local` or `codex_local` adapter validation in Docker-capable environments
- assertion that the CEO posts a real task/comment artifact
- stable release holdback until the credentialed lane passes
This should stay optional until the token-free lane is trustworthy.
## Acceptance Criteria
The plan is complete when the implemented system can demonstrate all of the following:
1. A published `paperclipai@canary` Docker install can be smoke-tested by Playwright in CI.
2. A published `paperclipai@latest` Docker install can be smoke-tested by Playwright in CI.
3. The test logs into authenticated mode with the smoke credentials.
4. The test sees onboarding for a fresh instance.
5. The test completes onboarding in the browser.
6. The test verifies the initial CEO agent was created.
7. The test verifies at least one CEO heartbeat run was triggered.
8. Failures produce actionable artifacts rather than just a red job.
## Risks And Decisions To Make
### 1. Fast process runs may finish before the UI visibly updates
That is expected. The assertions should prefer API polling for run existence/status rather than only visual indicators.
### 2. `latest` smoke is post-publish, not preventive
This is a real limitation of testing the published dist-tag itself. It is still valuable, but it should not be confused with a pre-publish gate.
### 3. We should not overcouple the test to cosmetic onboarding text
The important contract is flow success, created entities, and run creation. Use visible labels sparingly and prefer stable semantic selectors where possible.
### 4. Keep the smoke adapter path boring
For release safety, the first test should use the most boring runnable adapter possible. This is not the place to validate every adapter.
## Recommended First Slice
If we want the fastest path to value, ship this in order:
1. add detached mode to `scripts/docker-onboard-smoke.sh`
2. add one Playwright spec for authenticated login + onboarding + CEO run verification
3. add manual `release-smoke.yml`
4. once stable, wire canary into `release.yml`
5. after that, wire stable `latest` smoke into `release.yml`
That gives release confidence quickly without turning the first version into a large CI redesign.

View file

@ -0,0 +1,426 @@
# Paperclip Memory Service Plan
## Goal
Define a Paperclip memory service and surface API that can sit above multiple memory backends, while preserving Paperclip's control-plane requirements:
- company scoping
- auditability
- provenance back to Paperclip work objects
- budget / cost visibility
- plugin-first extensibility
This plan is based on the external landscape summarized in `doc/memory-landscape.md` and on the current Paperclip architecture in:
- `doc/SPEC-implementation.md`
- `doc/plugins/PLUGIN_SPEC.md`
- `doc/plugins/PLUGIN_AUTHORING_GUIDE.md`
- `packages/plugins/sdk/src/types.ts`
## Recommendation In One Sentence
Paperclip should not embed one opinionated memory engine into core. It should add a company-scoped memory control plane with a small normalized adapter contract, then let built-ins and plugins implement the provider-specific behavior.
## Product Decisions
### 1. Memory is company-scoped by default
Every memory binding belongs to exactly one company.
That binding can then be:
- the company default
- an agent override
- a project override later if we need it
No cross-company memory sharing in the initial design.
### 2. Providers are selected by key
Each configured memory provider gets a stable key inside a company, for example:
- `default`
- `mem0-prod`
- `local-markdown`
- `research-kb`
Agents and services resolve the active provider by key, not by hard-coded vendor logic.
### 3. Plugins are the primary provider path
Built-ins are useful for a zero-config local path, but most providers should arrive through the existing Paperclip plugin runtime.
That keeps the core small and matches the current direction that optional knowledge-like systems live at the edges.
### 4. Paperclip owns routing, provenance, and accounting
Providers should not decide how Paperclip entities map to governance.
Paperclip core should own:
- who is allowed to call a memory operation
- which company / agent / project scope is active
- what issue / run / comment / document the operation belongs to
- how usage gets recorded
### 5. Automatic memory should be narrow at first
Automatic capture is useful, but broad silent capture is dangerous.
Initial automatic hooks should be:
- post-run capture from agent runs
- issue comment / document capture when the binding enables it
- pre-run recall for agent context hydration
Everything else should start explicit.
## Proposed Concepts
### Memory provider
A built-in or plugin-supplied implementation that stores and retrieves memory.
Examples:
- local markdown + vector index
- mem0 adapter
- supermemory adapter
- MemOS adapter
### Memory binding
A company-scoped configuration record that points to a provider and carries provider-specific config.
This is the object selected by key.
### Memory scope
The normalized Paperclip scope passed into a provider request.
At minimum:
- `companyId`
- optional `agentId`
- optional `projectId`
- optional `issueId`
- optional `runId`
- optional `subjectId` for external/user identity
### Memory source reference
The provenance handle that explains where a memory came from.
Supported source kinds should include:
- `issue_comment`
- `issue_document`
- `issue`
- `run`
- `activity`
- `manual_note`
- `external_document`
### Memory operation
A normalized write, query, browse, or delete action performed through Paperclip.
Paperclip should log every operation, whether the provider is local or external.
## Required Adapter Contract
The required core should be small enough to fit `memsearch`, `mem0`, `Memori`, `MemOS`, or `OpenViking`.
```ts
export interface MemoryAdapterCapabilities {
profile?: boolean;
browse?: boolean;
correction?: boolean;
asyncIngestion?: boolean;
multimodal?: boolean;
providerManagedExtraction?: boolean;
}
export interface MemoryScope {
companyId: string;
agentId?: string;
projectId?: string;
issueId?: string;
runId?: string;
subjectId?: string;
}
export interface MemorySourceRef {
kind:
| "issue_comment"
| "issue_document"
| "issue"
| "run"
| "activity"
| "manual_note"
| "external_document";
companyId: string;
issueId?: string;
commentId?: string;
documentKey?: string;
runId?: string;
activityId?: string;
externalRef?: string;
}
export interface MemoryUsage {
provider: string;
model?: string;
inputTokens?: number;
outputTokens?: number;
embeddingTokens?: number;
costCents?: number;
latencyMs?: number;
details?: Record<string, unknown>;
}
export interface MemoryWriteRequest {
bindingKey: string;
scope: MemoryScope;
source: MemorySourceRef;
content: string;
metadata?: Record<string, unknown>;
mode?: "append" | "upsert" | "summarize";
}
export interface MemoryRecordHandle {
providerKey: string;
providerRecordId: string;
}
export interface MemoryQueryRequest {
bindingKey: string;
scope: MemoryScope;
query: string;
topK?: number;
intent?: "agent_preamble" | "answer" | "browse";
metadataFilter?: Record<string, unknown>;
}
export interface MemorySnippet {
handle: MemoryRecordHandle;
text: string;
score?: number;
summary?: string;
source?: MemorySourceRef;
metadata?: Record<string, unknown>;
}
export interface MemoryContextBundle {
snippets: MemorySnippet[];
profileSummary?: string;
usage?: MemoryUsage[];
}
export interface MemoryAdapter {
key: string;
capabilities: MemoryAdapterCapabilities;
write(req: MemoryWriteRequest): Promise<{
records?: MemoryRecordHandle[];
usage?: MemoryUsage[];
}>;
query(req: MemoryQueryRequest): Promise<MemoryContextBundle>;
get(handle: MemoryRecordHandle, scope: MemoryScope): Promise<MemorySnippet | null>;
forget(handles: MemoryRecordHandle[], scope: MemoryScope): Promise<{ usage?: MemoryUsage[] }>;
}
```
This contract intentionally does not force a provider to expose its internal graph, filesystem, or ontology.
## Optional Adapter Surfaces
These should be capability-gated, not required:
- `browse(scope, filters)` for file-system / graph / timeline inspection
- `correct(handle, patch)` for natural-language correction flows
- `profile(scope)` when the provider can synthesize stable preferences or summaries
- `sync(source)` for connectors or background ingestion
- `explain(queryResult)` for providers that can expose retrieval traces
## What Paperclip Should Persist
Paperclip should not mirror the full provider memory corpus into Postgres unless the provider is a Paperclip-managed local provider.
Paperclip core should persist:
- memory bindings and overrides
- provider keys and capability metadata
- normalized memory operation logs
- provider record handles returned by operations when available
- source references back to issue comments, documents, runs, and activity
- usage and cost data
For external providers, the memory payload itself can remain in the provider.
## Hook Model
### Automatic hooks
These should be low-risk and easy to reason about:
1. `pre-run hydrate`
Before an agent run starts, Paperclip may call `query(... intent = "agent_preamble")` using the active binding.
2. `post-run capture`
After a run finishes, Paperclip may write a summary or transcript-derived note tied to the run.
3. `issue comment / document capture`
When enabled on the binding, Paperclip may capture selected issue comments or issue documents as memory sources.
### Explicit hooks
These should be tool- or UI-driven first:
- `memory.search`
- `memory.note`
- `memory.forget`
- `memory.correct`
- `memory.browse`
### Not automatic in the first version
- broad web crawling
- silent import of arbitrary repo files
- cross-company memory sharing
- automatic destructive deletion
- provider migration between bindings
## Agent UX Rules
Paperclip should give agents both automatic recall and explicit tools, with simple guidance:
- use `memory.search` when the task depends on prior decisions, people, projects, or long-running context that is not in the current issue thread
- use `memory.note` when a durable fact, preference, or decision should survive this run
- use `memory.correct` when the user explicitly says prior context is wrong
- rely on post-run auto-capture for ordinary session residue so agents do not have to write memory notes for every trivial exchange
This keeps memory available without forcing every agent prompt to become a memory-management protocol.
## Browse And Inspect Surface
Paperclip needs a first-class UI for memory, otherwise providers become black boxes.
The initial browse surface should support:
- active binding by company and agent
- recent memory operations
- recent write sources
- query results with source backlinks
- filters by agent, issue, run, source kind, and date
- provider usage / cost / latency summaries
When a provider supports richer browsing, the plugin can add deeper views through the existing plugin UI surfaces.
## Cost And Evaluation
Every adapter response should be able to return usage records.
Paperclip should roll up:
- memory inference tokens
- embedding tokens
- external provider cost
- latency
- query count
- write count
It should also record evaluation-oriented metrics where possible:
- recall hit rate
- empty query rate
- manual correction count
- per-binding success / failure counts
This is important because a memory system that "works" but silently burns budget is not acceptable in Paperclip.
## Suggested Data Model Additions
At the control-plane level, the likely new core tables are:
- `memory_bindings`
- company-scoped key
- provider id / plugin id
- config blob
- enabled status
- `memory_binding_targets`
- target type (`company`, `agent`, later `project`)
- target id
- binding id
- `memory_operations`
- company id
- binding id
- operation type (`write`, `query`, `forget`, `browse`, `correct`)
- scope fields
- source refs
- usage / latency / cost
- success / error
Provider-specific long-form state should stay in plugin state or the provider itself unless a built-in local provider needs its own schema.
## Recommended First Built-In
The best zero-config built-in is a local markdown-first provider with optional semantic indexing.
Why:
- it matches Paperclip's local-first posture
- it is inspectable
- it is easy to back up and debug
- it gives the system a baseline even without external API keys
The design should still treat that built-in as just another provider behind the same control-plane contract.
## Rollout Phases
### Phase 1: Control-plane contract
- add memory binding models and API types
- add plugin capability / registration surface for memory providers
- add operation logging and usage reporting
### Phase 2: One built-in + one plugin example
- ship a local markdown-first provider
- ship one hosted adapter example to validate the external-provider path
### Phase 3: UI inspection
- add company / agent memory settings
- add a memory operation explorer
- add source backlinks to issues and runs
### Phase 4: Automatic hooks
- pre-run hydrate
- post-run capture
- selected issue comment / document capture
### Phase 5: Rich capabilities
- correction flows
- provider-native browse / graph views
- project-level overrides if needed
- evaluation dashboards
## Open Questions
- Should project overrides exist in V1 of the memory service, or should we force company default + agent override first?
- Do we want Paperclip-managed extraction pipelines at all, or should built-ins be the only place where Paperclip owns extraction?
- Should memory usage extend the current `cost_events` model directly, or should memory operations keep a parallel usage log and roll up into `cost_events` secondarily?
- Do we want provider install / binding changes to require approvals for some companies?
## Bottom Line
The right abstraction is:
- Paperclip owns memory bindings, scopes, provenance, governance, and usage reporting.
- Providers own extraction, ranking, storage, and provider-native memory semantics.
That gives Paperclip a stable "memory service" without locking the product to one memory philosophy or one vendor.

View file

@ -0,0 +1,488 @@
# Release Automation and Versioning Simplification Plan
## Context
Paperclip's current release flow is documented in `doc/RELEASING.md` and implemented through:
- `.github/workflows/release.yml`
- `scripts/release-lib.sh`
- `scripts/release-start.sh`
- `scripts/release-preflight.sh`
- `scripts/release.sh`
- `scripts/create-github-release.sh`
Today the model is:
1. pick `patch`, `minor`, or `major`
2. create `release/X.Y.Z`
3. draft `releases/vX.Y.Z.md`
4. publish one or more canaries from that release branch
5. publish stable from that same branch
6. push tag + create GitHub Release
7. merge the release branch back to `master`
That is workable, but it creates friction in exactly the places that should be cheap:
- deciding `patch` vs `minor` vs `major`
- cutting and carrying release branches
- manually publishing canaries
- thinking about changelog generation for canaries
- handling npm credentials safely in a public repo
The target state from this discussion is simpler:
- every push to `master` publishes a canary automatically
- stable releases are promoted deliberately from a vetted commit
- versioning is date-driven instead of semantics-driven
- stable publishing is secure even in a public open-source repository
- changelog generation happens only for real stable releases
## Recommendation In One Sentence
Move Paperclip to semver-compatible calendar versioning, auto-publish canaries from `master`, promote stable from a chosen tested commit, and use npm trusted publishing plus GitHub environments so no long-lived npm or LLM token needs to live in Actions.
## Core Decisions
### 1. Use calendar versions, but keep semver syntax
The repo and npm tooling still assume semver-shaped version strings in many places. That does not mean Paperclip must keep semver as a product policy. It does mean the version format should remain semver-valid.
Recommended format:
- stable: `YYYY.MDD.P`
- canary: `YYYY.MDD.P-canary.N`
Examples:
- first stable on March 17, 2026: `2026.317.0`
- third canary on the `2026.317.0` line: `2026.317.0-canary.2`
Why this shape:
- it removes `patch/minor/major` decisions
- it is valid semver syntax
- it stays compatible with npm, dist-tags, and existing semver validators
- it is close to the format you actually want
Important constraints:
- the middle numeric slot should be `MDD`, where `M` is the month and `DD` is the zero-padded day
- `2026.03.17` is not the format to use
- numeric semver identifiers do not allow leading zeroes
- `2026.3.17.1` is not the format to use
- semver has three numeric components, not four
- the practical semver-safe equivalent is `2026.317.0-canary.8`
This is effectively CalVer on semver rails.
### 2. Accept that CalVer changes the compatibility contract
This is not semver in spirit anymore. It is semver in syntax only.
That tradeoff is probably acceptable for Paperclip, but it should be explicit:
- consumers no longer infer compatibility from `major/minor/patch`
- release notes become the compatibility signal
- downstream users should prefer exact pins or deliberate upgrades
This is especially relevant for public library packages like `@paperclipai/shared`, `@paperclipai/db`, and the adapter packages.
### 3. Drop release branches for normal publishing
If every merge to `master` publishes a canary, the current `release/X.Y.Z` train model becomes more ceremony than value.
Recommended replacement:
- `master` is the only canary train
- every push to `master` can publish a canary
- stable is published from a chosen commit or canary tag on `master`
This matches the workflow you actually want:
- merge continuously
- let npm always have a fresh canary
- choose a known-good canary later and promote that commit to stable
### 4. Promote by source ref, not by "renaming" a canary
This is the most important mechanical constraint.
npm can move dist-tags, but it does not let you rename an already-published version. That means:
- you can move `latest` to `paperclipai@1.2.3`
- you cannot turn `paperclipai@2026.317.0-canary.8` into `paperclipai@2026.317.0`
So "promote canary to stable" really means:
1. choose the commit or canary tag you trust
2. rebuild from that exact commit
3. publish it again with the stable version string
Because of that, the stable workflow should take a source ref, not just a bump type.
Recommended stable input:
- `source_ref`
- commit SHA, or
- a canary git tag such as `canary/v2026.317.1-canary.8`
### 5. Only stable releases get release notes, tags, and GitHub Releases
Canaries should stay lightweight:
- publish to npm under `canary`
- optionally create a lightweight or annotated git tag
- do not create GitHub Releases
- do not require `releases/v*.md`
- do not spend LLM tokens
Stable releases should remain the public narrative surface:
- git tag `v2026.317.0`
- GitHub Release `v2026.317.0`
- stable changelog file `releases/v2026.317.0.md`
## Security Model
### Recommendation
Use npm trusted publishing with GitHub Actions OIDC, then disable token-based publishing access for the packages.
Why:
- no long-lived `NPM_TOKEN` in repo or org secrets
- no personal npm token in Actions
- short-lived credentials minted only for the authorized workflow
- automatic npm provenance for public packages in public repos
This is the cleanest answer to the open-repo security concern.
### Concrete controls
#### 1. Use one release workflow file
Use one workflow filename for both canary and stable publishing:
- `.github/workflows/release.yml`
Why:
- npm trusted publishing is configured per workflow filename
- npm currently allows one trusted publisher configuration per package
- GitHub environments can still provide separate canary/stable approval rules inside the same workflow
#### 2. Use separate GitHub environments
Recommended environments:
- `npm-canary`
- `npm-stable`
Recommended policy:
- `npm-canary`
- allowed branch: `master`
- no human reviewer required
- `npm-stable`
- allowed branch: `master`
- required reviewer enabled
- prevent self-review enabled
- admin bypass disabled
Stable should require an explicit second human gate even if the workflow is manually dispatched.
#### 3. Lock down workflow edits
Add or tighten `CODEOWNERS` coverage for:
- `.github/workflows/*`
- `scripts/release*`
- `doc/RELEASING.md`
This matters because trusted publishing authorizes a workflow file. The biggest remaining risk is not secret exfiltration from forks. It is a maintainer-approved change to the release workflow itself.
#### 4. Remove traditional npm token access after OIDC works
After trusted publishing is verified:
- set package publishing access to require 2FA and disallow tokens
- revoke any legacy automation tokens
That eliminates the "someone stole the npm token" class of failure.
### What not to do
- do not put your personal Claude or npm token in GitHub Actions
- do not run release logic from `pull_request_target`
- do not make stable publishing depend on a repo secret if OIDC can handle it
- do not create canary GitHub Releases
## Changelog Strategy
### Recommendation
Generate stable changelogs only, and keep LLM-assisted changelog generation out of CI for now.
Reasoning:
- canaries happen too often
- canaries do not need polished public notes
- putting a personal Claude token into Actions is not worth the risk
- stable release cadence is low enough that a human-in-the-loop step is acceptable
Recommended stable path:
1. pick a canary commit or tag
2. run changelog generation locally from a trusted machine
3. commit `releases/vYYYY.MDD.P.md`
4. run stable promotion
If the notes are not ready yet, a fallback is acceptable:
- publish stable
- create a minimal GitHub Release
- update `releases/vYYYY.MDD.P.md` immediately afterward
But the better steady-state is to have the stable notes committed before stable publish.
### Future option
If you later want CI-assisted changelog drafting, do it with:
- a dedicated service account
- a token scoped only for changelog generation
- a manual workflow
- a dedicated environment with required reviewers
That is phase-two hardening work, not a phase-one requirement.
## Proposed Future Workflow
### Canary workflow
Trigger:
- `push` on `master`
Steps:
1. checkout the merged `master` commit
2. run verification on that exact commit
3. compute canary version for current UTC date
4. version public packages to `YYYY.MDD.P-canary.N`
5. publish to npm with dist-tag `canary`
6. create a canary git tag for traceability
Recommended canary tag format:
- `canary/v2026.317.1-canary.4`
Outputs:
- npm canary published
- git tag created
- no GitHub Release
- no changelog file required
### Stable workflow
Trigger:
- `workflow_dispatch`
Inputs:
- `source_ref`
- optional `stable_date`
- `dry_run`
Steps:
1. checkout `source_ref`
2. run verification on that exact commit
3. compute the next stable patch slot for the UTC date or provided override
4. fail if `vYYYY.MDD.P` already exists
5. require `releases/vYYYY.MDD.P.md`
6. version public packages to `YYYY.MDD.P`
7. publish to npm under `latest`
8. create git tag `vYYYY.MDD.P`
9. push tag
10. create GitHub Release from `releases/vYYYY.MDD.P.md`
Outputs:
- stable npm release
- stable git tag
- GitHub Release
- clean public changelog surface
## Implementation Guidance
### 1. Replace bump-type version math with explicit version computation
The current release scripts depend on:
- `patch`
- `minor`
- `major`
That logic should be replaced with:
- `compute_canary_version_for_date`
- `compute_stable_version_for_date`
For example:
- `next_stable_version(2026-03-17) -> 2026.317.0`
- `next_canary_for_utc_date(2026-03-17) -> 2026.317.0-canary.0`
### 2. Stop requiring `release/X.Y.Z`
These current invariants should be removed from the happy path:
- "must run from branch `release/X.Y.Z`"
- "stable and canary for `X.Y.Z` come from the same release branch"
- `release-start.sh`
Replace them with:
- canary must run from `master`
- stable may run from a pinned `source_ref`
### 3. Keep Changesets only if it stays helpful
The current system uses Changesets to:
- rewrite package versions
- maintain package-level `CHANGELOG.md` files
- publish packages
With CalVer, Changesets may still be useful for publish orchestration, but it should no longer own version selection.
Recommended implementation order:
1. keep `changeset publish` if it works with explicitly-set versions
2. replace version computation with a small explicit versioning script
3. if Changesets keeps fighting the model, remove it from release publishing entirely
Paperclip's release problem is now "publish the whole fixed package set at one explicit version", not "derive the next semantic bump from human intent".
### 4. Add a dedicated versioning script
Recommended new script:
- `scripts/set-release-version.mjs`
Responsibilities:
- set the version in all public publishable packages
- update any internal exact-version references needed for publishing
- update CLI version strings
- avoid broad string replacement across unrelated files
This is safer than keeping a bump-oriented changeset flow and then forcing it into a date-based scheme.
### 5. Keep rollback based on dist-tags
`rollback-latest.sh` should stay, but it should stop assuming a semver meaning beyond syntax.
It should continue to:
- repoint `latest` to a prior stable version
- never unpublish
## Tradeoffs and Risks
### 1. The stable patch slot is now part of the version contract
With `YYYY.MDD.P`, same-day hotfixes are supported, but the stable patch slot is now part of the visible version format.
That is the right tradeoff because:
1. npm still gets semver-valid versions
2. same-day hotfixes stay possible
3. chronological ordering still works as long as the day is zero-padded inside `MDD`
### 2. Public package consumers lose semver intent signaling
This is the main downside of CalVer.
If that becomes a problem, one alternative is:
- use CalVer for the CLI package only
- keep semver for library packages
That is more complex operationally, so I would not start there unless package consumers actually need it.
### 3. Auto-canary means more publish traffic
Publishing on every `master` merge means:
- more npm versions
- more git tags
- more registry noise
That is acceptable if canaries stay clearly separate:
- npm dist-tag `canary`
- no GitHub Release
- no external announcement
## Rollout Plan
### Phase 1: Security foundation
1. Create `release.yml`
2. Configure npm trusted publishers for all public packages
3. Create `npm-canary` and `npm-stable` environments
4. Add `CODEOWNERS` protection for release files
5. Verify OIDC publishing works
6. Disable token-based publishing access and revoke old tokens
### Phase 2: Canary automation
1. Add canary workflow on `push` to `master`
2. Add explicit calendar-version computation
3. Add canary git tagging
4. Remove changelog requirement from canaries
5. Update `doc/RELEASING.md`
### Phase 3: Stable promotion
1. Add manual stable workflow with `source_ref`
2. Require stable notes file
3. Publish stable + tag + GitHub Release
4. Update rollback docs and scripts
5. Retire release-branch assumptions
### Phase 4: Cleanup
1. Remove `release-start.sh` from the primary path
2. Remove `patch/minor/major` from maintainer docs
3. Decide whether to keep or remove Changesets from publishing
4. Document the CalVer compatibility contract publicly
## Concrete Recommendation
Paperclip should adopt this model:
- stable versions: `YYYY.MDD.P`
- canary versions: `YYYY.MDD.P-canary.N`
- canaries auto-published on every push to `master`
- stables manually promoted from a chosen tested commit or canary tag
- no release branches in the default path
- no canary changelog files
- no canary GitHub Releases
- no Claude token in GitHub Actions
- no npm automation token in GitHub Actions
- npm trusted publishing plus GitHub environments for release security
That gets rid of the annoying part of semver without fighting npm, makes canaries cheap, keeps stables deliberate, and materially improves the security posture of the public repository.
## External References
- npm trusted publishing: https://docs.npmjs.com/trusted-publishers/
- npm dist-tags: https://docs.npmjs.com/adding-dist-tags-to-packages/
- npm semantic versioning guidance: https://docs.npmjs.com/about-semantic-versioning/
- GitHub environments and deployment protection rules: https://docs.github.com/en/actions/how-tos/deploy/configure-and-manage-deployments/manage-environments
- GitHub secrets behavior for forks: https://docs.github.com/en/actions/how-tos/write-workflows/choose-what-workflows-do/use-secrets