Merge public-gh/master into paperclip-company-import-export

2026-06-20 04:20:38 +09:00 · 2026-03-18 09:57:26 -05:00 · 2026-03-18 09:57:26 -05:00 · 9e19f1d005
commit 9e19f1d005
parent 154a4a7ac1 731c9544b3
49 changed files with 3997 additions and 2501 deletions
--- a/doc/plans/2026-03-17-docker-release-browser-e2e.md
+++ b/doc/plans/2026-03-17-docker-release-browser-e2e.md
@ -0,0 +1,424 @@
+# Docker Release Browser E2E Plan
+
+## Context
+
+Today release smoke testing for published Paperclip packages is manual and shell-driven:
+
+```sh
+HOST_PORT=3232 DATA_DIR=./data/release-smoke-canary PAPERCLIPAI_VERSION=canary ./scripts/docker-onboard-smoke.sh
+HOST_PORT=3233 DATA_DIR=./data/release-smoke-stable PAPERCLIPAI_VERSION=latest ./scripts/docker-onboard-smoke.sh
+```
+
+That is useful because it exercises the same public install surface users hit:
+
+- Docker
+- `npx paperclipai@canary`
+- `npx paperclipai@latest`
+- authenticated bootstrap flow
+
+But it still leaves the most important release questions to a human with a browser:
+
+- can I sign in with the smoke credentials?
+- do I land in onboarding?
+- can I complete onboarding?
+- does the initial CEO agent actually get created and run?
+
+The repo already has two adjacent pieces:
+
+- `tests/e2e/onboarding.spec.ts` covers the onboarding wizard against the local source tree
+- `scripts/docker-onboard-smoke.sh` boots a published Docker install and auto-bootstraps authenticated mode, but only verifies the API/session layer
+
+What is missing is one deterministic browser test that joins those two paths.
+
+## Goal
+
+Add a release-grade Docker-backed browser E2E that validates the published `canary` and `latest` installs end to end:
+
+1. boot the published package in Docker
+2. sign in with known smoke credentials
+3. verify the user is routed into onboarding
+4. complete onboarding in the browser
+5. verify the first CEO agent exists
+6. verify the initial CEO run was triggered and reached a terminal or active state
+
+Then wire that test into GitHub Actions so release validation is no longer manual-only.
+
+## Recommendation In One Sentence
+
+Turn the current Docker smoke script into a machine-friendly test harness, add a dedicated Playwright release-smoke spec that drives the authenticated browser flow against published Docker installs, and run it in GitHub Actions for both `canary` and `latest`.
+
+## What We Have Today
+
+### Existing local browser coverage
+
+`tests/e2e/onboarding.spec.ts` already proves the onboarding wizard can:
+
+- create a company
+- create a CEO agent
+- create an initial issue
+- optionally observe task progress
+
+That is a good base, but it does not validate the public npm package, Docker path, authenticated login flow, or release dist-tags.
+
+### Existing Docker smoke coverage
+
+`scripts/docker-onboard-smoke.sh` already does useful setup work:
+
+- builds `Dockerfile.onboard-smoke`
+- runs `paperclipai@${PAPERCLIPAI_VERSION}` inside Docker
+- waits for health
+- signs up or signs in a smoke admin user
+- generates and accepts the bootstrap CEO invite in authenticated mode
+- verifies a board session and `/api/companies`
+
+That means the hard bootstrap problem is mostly solved already. The main gap is that the script is human-oriented and never hands control to a browser test.
+
+### Existing CI shape
+
+The repo already has:
+
+- `.github/workflows/e2e.yml` for manual Playwright runs against local source
+- `.github/workflows/release.yml` for canary publish on `master` and manual stable promotion
+
+So the right move is to extend the current test/release system, not create a parallel one.
+
+## Product Decision
+
+### 1. The release smoke should stay deterministic and token-free
+
+The first version should not require OpenAI, Anthropic, or external agent credentials.
+
+Use the onboarding flow with a deterministic adapter that can run on a stock GitHub runner and inside the published Docker install. The existing `process` adapter with a trivial command is the right base path for this release gate.
+
+That keeps this test focused on:
+
+- release packaging
+- auth/bootstrap
+- UI routing
+- onboarding contract
+- agent creation
+- heartbeat invocation plumbing
+
+Later we can add a second credentialed smoke lane for real model-backed agents.
+
+### 2. Smoke credentials become an explicit test contract
+
+The current defaults in `scripts/docker-onboard-smoke.sh` should be treated as stable test fixtures:
+
+- email: `smoke-admin@paperclip.local`
+- password: `paperclip-smoke-password`
+
+The browser test should log in with those exact values unless overridden by env vars.
+
+### 3. Published-package smoke and source-tree E2E stay separate
+
+Keep two lanes:
+
+- source-tree E2E for feature development
+- published Docker release smoke for release confidence
+
+They overlap on onboarding assertions, but they guard different failure classes.
+
+## Proposed Design
+
+## 1. Add a CI-friendly Docker smoke harness
+
+Refactor `scripts/docker-onboard-smoke.sh` so it can run in two modes:
+
+- interactive mode
+  - current behavior
+  - streams logs and waits in foreground for manual inspection
+- CI mode
+  - starts the container
+  - waits for health and authenticated bootstrap
+  - prints machine-readable metadata
+  - exits while leaving the container running for Playwright
+
+Recommended shape:
+
+- keep `scripts/docker-onboard-smoke.sh` as the public entry point
+- add a `SMOKE_DETACH=true` or `--detach` mode
+- emit a JSON blob or `.env` file containing:
+  - `SMOKE_BASE_URL`
+  - `SMOKE_ADMIN_EMAIL`
+  - `SMOKE_ADMIN_PASSWORD`
+  - `SMOKE_CONTAINER_NAME`
+  - `SMOKE_DATA_DIR`
+
+The workflow and Playwright tests can then consume the emitted metadata instead of scraping logs.
+
+### Why this matters
+
+The current script always tails logs and then blocks on `wait "$LOG_PID"`. That is convenient for manual smoke testing, but it is the wrong shape for CI orchestration.
+
+## 2. Add a dedicated Playwright release-smoke spec
+
+Create a second Playwright entry point specifically for published Docker installs, for example:
+
+- `tests/release-smoke/playwright.config.ts`
+- `tests/release-smoke/docker-auth-onboarding.spec.ts`
+
+This suite should not use Playwright `webServer`, because the app server will already be running inside Docker.
+
+### Browser scenario
+
+The first release-smoke scenario should validate:
+
+1. open `/`
+2. unauthenticated user is redirected to `/auth`
+3. sign in using the smoke credentials
+4. authenticated user lands on onboarding when no companies exist
+5. onboarding wizard appears with the expected step labels
+6. create a company
+7. create the first agent using `process`
+8. create the initial issue
+9. finish onboarding and open the created issue
+10. verify via API:
+    - company exists
+    - CEO agent exists
+    - issue exists and is assigned to the CEO
+11. verify the first heartbeat run was triggered:
+    - either by checking issue status changed from initial state, or
+    - by checking agent/runs API shows a run for the CEO, or
+    - both
+
+The test should tolerate the run completing quickly. For this reason, the assertion should accept:
+
+- `queued`
+- `running`
+- `succeeded`
+
+and similarly for issue progression if the issue status changes before the assertion runs.
+
+### Why a separate spec instead of reusing `tests/e2e/onboarding.spec.ts`
+
+The local-source test and release-smoke test have different assumptions:
+
+- different server lifecycle
+- different auth path
+- different deployment mode
+- published npm package instead of local workspace code
+
+Trying to force both through one spec will make both worse.
+
+## 3. Add a release-smoke workflow in GitHub Actions
+
+Add a workflow dedicated to this surface, ideally reusable:
+
+- `.github/workflows/release-smoke.yml`
+
+Recommended triggers:
+
+- `workflow_dispatch`
+- `workflow_call`
+
+Recommended inputs:
+
+- `paperclip_version`
+  - `canary` or `latest`
+- `host_port`
+  - optional, default runner-safe port
+- `artifact_name`
+  - optional for clearer uploads
+
+### Job outline
+
+1. checkout repo
+2. install Node/pnpm
+3. install Playwright browser dependencies
+4. launch Docker smoke harness in detached mode with the chosen dist-tag
+5. run the release-smoke Playwright suite against the returned base URL
+6. always collect diagnostics:
+   - Playwright report
+   - screenshots
+   - trace
+   - `docker logs`
+   - harness metadata file
+7. stop and remove container
+
+### Why a reusable workflow
+
+This lets us:
+
+- run the smoke manually on demand
+- call it from `release.yml`
+- reuse the same job for both `canary` and `latest`
+
+## 4. Integrate it into release automation incrementally
+
+### Phase A: Manual workflow only
+
+First ship the workflow as manual-only so the harness and test can be stabilized without blocking releases.
+
+### Phase B: Run automatically after canary publish
+
+After `publish_canary` succeeds in `.github/workflows/release.yml`, call the reusable release-smoke workflow with:
+
+- `paperclip_version=canary`
+
+This proves the just-published public canary really boots and onboards.
+
+### Phase C: Run automatically after stable publish
+
+After `publish_stable` succeeds, call the same workflow with:
+
+- `paperclip_version=latest`
+
+This gives us post-publish confirmation that the stable dist-tag is healthy.
+
+### Important nuance
+
+Testing `latest` from npm cannot happen before stable publish, because the package under test does not exist under `latest` yet. So the `latest` smoke is a post-publish verification, not a pre-publish gate.
+
+If we later want a true pre-publish stable gate, that should be a separate source-ref or locally built package smoke job.
+
+## 5. Make diagnostics first-class
+
+This workflow is only valuable if failures are fast to debug.
+
+Always capture:
+
+- Playwright HTML report
+- Playwright trace on failure
+- final screenshot on failure
+- full `docker logs` output
+- emitted smoke metadata
+- optional `curl /api/health` snapshot
+
+Without that, the test will become a flaky black box and people will stop trusting it.
+
+## Implementation Plan
+
+## Phase 1: Harness refactor
+
+Files:
+
+- `scripts/docker-onboard-smoke.sh`
+- optionally `scripts/lib/docker-onboard-smoke.sh` or similar helper
+- `doc/DOCKER.md`
+- `doc/RELEASING.md`
+
+Tasks:
+
+1. Add detached/CI mode to the Docker smoke script.
+2. Make the script emit machine-readable connection metadata.
+3. Keep the current interactive manual mode intact.
+4. Add reliable cleanup commands for CI.
+
+Acceptance:
+
+- a script invocation can start the published Docker app, auto-bootstrap it, and return control to the caller with enough metadata for browser automation
+
+## Phase 2: Browser release-smoke suite
+
+Files:
+
+- `tests/release-smoke/playwright.config.ts`
+- `tests/release-smoke/docker-auth-onboarding.spec.ts`
+- root `package.json`
+
+Tasks:
+
+1. Add a dedicated Playwright config for external server testing.
+2. Implement login + onboarding + CEO creation flow.
+3. Assert a CEO run was created or completed.
+4. Add a root script such as:
+   - `test:release-smoke`
+
+Acceptance:
+
+- the suite passes locally against both:
+  - `PAPERCLIPAI_VERSION=canary`
+  - `PAPERCLIPAI_VERSION=latest`
+
+## Phase 3: GitHub Actions workflow
+
+Files:
+
+- `.github/workflows/release-smoke.yml`
+
+Tasks:
+
+1. Add manual and reusable workflow entry points.
+2. Install Chromium and runner dependencies.
+3. Start Docker smoke in detached mode.
+4. Run the release-smoke Playwright suite.
+5. Upload diagnostics artifacts.
+
+Acceptance:
+
+- a maintainer can run the workflow manually for either `canary` or `latest`
+
+## Phase 4: Release workflow integration
+
+Files:
+
+- `.github/workflows/release.yml`
+- `doc/RELEASING.md`
+
+Tasks:
+
+1. Trigger release smoke automatically after canary publish.
+2. Trigger release smoke automatically after stable publish.
+3. Document expected behavior and failure handling.
+
+Acceptance:
+
+- canary releases automatically produce a published-package browser smoke result
+- stable releases automatically produce a `latest` browser smoke result
+
+## Phase 5: Future extension for real model-backed agent validation
+
+Not part of the first implementation, but this should be the next layer after the deterministic lane is stable.
+
+Possible additions:
+
+- a second Playwright project gated on repo secrets
+- real `claude_local` or `codex_local` adapter validation in Docker-capable environments
+- assertion that the CEO posts a real task/comment artifact
+- stable release holdback until the credentialed lane passes
+
+This should stay optional until the token-free lane is trustworthy.
+
+## Acceptance Criteria
+
+The plan is complete when the implemented system can demonstrate all of the following:
+
+1. A published `paperclipai@canary` Docker install can be smoke-tested by Playwright in CI.
+2. A published `paperclipai@latest` Docker install can be smoke-tested by Playwright in CI.
+3. The test logs into authenticated mode with the smoke credentials.
+4. The test sees onboarding for a fresh instance.
+5. The test completes onboarding in the browser.
+6. The test verifies the initial CEO agent was created.
+7. The test verifies at least one CEO heartbeat run was triggered.
+8. Failures produce actionable artifacts rather than just a red job.
+
+## Risks And Decisions To Make
+
+### 1. Fast process runs may finish before the UI visibly updates
+
+That is expected. The assertions should prefer API polling for run existence/status rather than only visual indicators.
+
+### 2. `latest` smoke is post-publish, not preventive
+
+This is a real limitation of testing the published dist-tag itself. It is still valuable, but it should not be confused with a pre-publish gate.
+
+### 3. We should not overcouple the test to cosmetic onboarding text
+
+The important contract is flow success, created entities, and run creation. Use visible labels sparingly and prefer stable semantic selectors where possible.
+
+### 4. Keep the smoke adapter path boring
+
+For release safety, the first test should use the most boring runnable adapter possible. This is not the place to validate every adapter.
+
+## Recommended First Slice
+
+If we want the fastest path to value, ship this in order:
+
+1. add detached mode to `scripts/docker-onboard-smoke.sh`
+2. add one Playwright spec for authenticated login + onboarding + CEO run verification
+3. add manual `release-smoke.yml`
+4. once stable, wire canary into `release.yml`
+5. after that, wire stable `latest` smoke into `release.yml`
+
+That gives release confidence quickly without turning the first version into a large CI redesign.
--- a/doc/plans/2026-03-17-memory-service-surface-api.md
+++ b/doc/plans/2026-03-17-memory-service-surface-api.md
@ -0,0 +1,426 @@
+# Paperclip Memory Service Plan
+
+## Goal
+
+Define a Paperclip memory service and surface API that can sit above multiple memory backends, while preserving Paperclip's control-plane requirements:
+
+- company scoping
+- auditability
+- provenance back to Paperclip work objects
+- budget / cost visibility
+- plugin-first extensibility
+
+This plan is based on the external landscape summarized in `doc/memory-landscape.md` and on the current Paperclip architecture in:
+
+- `doc/SPEC-implementation.md`
+- `doc/plugins/PLUGIN_SPEC.md`
+- `doc/plugins/PLUGIN_AUTHORING_GUIDE.md`
+- `packages/plugins/sdk/src/types.ts`
+
+## Recommendation In One Sentence
+
+Paperclip should not embed one opinionated memory engine into core. It should add a company-scoped memory control plane with a small normalized adapter contract, then let built-ins and plugins implement the provider-specific behavior.
+
+## Product Decisions
+
+### 1. Memory is company-scoped by default
+
+Every memory binding belongs to exactly one company.
+
+That binding can then be:
+
+- the company default
+- an agent override
+- a project override later if we need it
+
+No cross-company memory sharing in the initial design.
+
+### 2. Providers are selected by key
+
+Each configured memory provider gets a stable key inside a company, for example:
+
+- `default`
+- `mem0-prod`
+- `local-markdown`
+- `research-kb`
+
+Agents and services resolve the active provider by key, not by hard-coded vendor logic.
+
+### 3. Plugins are the primary provider path
+
+Built-ins are useful for a zero-config local path, but most providers should arrive through the existing Paperclip plugin runtime.
+
+That keeps the core small and matches the current direction that optional knowledge-like systems live at the edges.
+
+### 4. Paperclip owns routing, provenance, and accounting
+
+Providers should not decide how Paperclip entities map to governance.
+
+Paperclip core should own:
+
+- who is allowed to call a memory operation
+- which company / agent / project scope is active
+- what issue / run / comment / document the operation belongs to
+- how usage gets recorded
+
+### 5. Automatic memory should be narrow at first
+
+Automatic capture is useful, but broad silent capture is dangerous.
+
+Initial automatic hooks should be:
+
+- post-run capture from agent runs
+- issue comment / document capture when the binding enables it
+- pre-run recall for agent context hydration
+
+Everything else should start explicit.
+
+## Proposed Concepts
+
+### Memory provider
+
+A built-in or plugin-supplied implementation that stores and retrieves memory.
+
+Examples:
+
+- local markdown + vector index
+- mem0 adapter
+- supermemory adapter
+- MemOS adapter
+
+### Memory binding
+
+A company-scoped configuration record that points to a provider and carries provider-specific config.
+
+This is the object selected by key.
+
+### Memory scope
+
+The normalized Paperclip scope passed into a provider request.
+
+At minimum:
+
+- `companyId`
+- optional `agentId`
+- optional `projectId`
+- optional `issueId`
+- optional `runId`
+- optional `subjectId` for external/user identity
+
+### Memory source reference
+
+The provenance handle that explains where a memory came from.
+
+Supported source kinds should include:
+
+- `issue_comment`
+- `issue_document`
+- `issue`
+- `run`
+- `activity`
+- `manual_note`
+- `external_document`
+
+### Memory operation
+
+A normalized write, query, browse, or delete action performed through Paperclip.
+
+Paperclip should log every operation, whether the provider is local or external.
+
+## Required Adapter Contract
+
+The required core should be small enough to fit `memsearch`, `mem0`, `Memori`, `MemOS`, or `OpenViking`.
+
+```ts
+export interface MemoryAdapterCapabilities {
+  profile?: boolean;
+  browse?: boolean;
+  correction?: boolean;
+  asyncIngestion?: boolean;
+  multimodal?: boolean;
+  providerManagedExtraction?: boolean;
+}
+
+export interface MemoryScope {
+  companyId: string;
+  agentId?: string;
+  projectId?: string;
+  issueId?: string;
+  runId?: string;
+  subjectId?: string;
+}
+
+export interface MemorySourceRef {
+  kind:
+    | "issue_comment"
+    | "issue_document"
+    | "issue"
+    | "run"
+    | "activity"
+    | "manual_note"
+    | "external_document";
+  companyId: string;
+  issueId?: string;
+  commentId?: string;
+  documentKey?: string;
+  runId?: string;
+  activityId?: string;
+  externalRef?: string;
+}
+
+export interface MemoryUsage {
+  provider: string;
+  model?: string;
+  inputTokens?: number;
+  outputTokens?: number;
+  embeddingTokens?: number;
+  costCents?: number;
+  latencyMs?: number;
+  details?: Record<string, unknown>;
+}
+
+export interface MemoryWriteRequest {
+  bindingKey: string;
+  scope: MemoryScope;
+  source: MemorySourceRef;
+  content: string;
+  metadata?: Record<string, unknown>;
+  mode?: "append" | "upsert" | "summarize";
+}
+
+export interface MemoryRecordHandle {
+  providerKey: string;
+  providerRecordId: string;
+}
+
+export interface MemoryQueryRequest {
+  bindingKey: string;
+  scope: MemoryScope;
+  query: string;
+  topK?: number;
+  intent?: "agent_preamble" | "answer" | "browse";
+  metadataFilter?: Record<string, unknown>;
+}
+
+export interface MemorySnippet {
+  handle: MemoryRecordHandle;
+  text: string;
+  score?: number;
+  summary?: string;
+  source?: MemorySourceRef;
+  metadata?: Record<string, unknown>;
+}
+
+export interface MemoryContextBundle {
+  snippets: MemorySnippet[];
+  profileSummary?: string;
+  usage?: MemoryUsage[];
+}
+
+export interface MemoryAdapter {
+  key: string;
+  capabilities: MemoryAdapterCapabilities;
+  write(req: MemoryWriteRequest): Promise<{
+    records?: MemoryRecordHandle[];
+    usage?: MemoryUsage[];
+  }>;
+  query(req: MemoryQueryRequest): Promise<MemoryContextBundle>;
+  get(handle: MemoryRecordHandle, scope: MemoryScope): Promise<MemorySnippet | null>;
+  forget(handles: MemoryRecordHandle[], scope: MemoryScope): Promise<{ usage?: MemoryUsage[] }>;
+}
+```
+
+This contract intentionally does not force a provider to expose its internal graph, filesystem, or ontology.
+
+## Optional Adapter Surfaces
+
+These should be capability-gated, not required:
+
+- `browse(scope, filters)` for file-system / graph / timeline inspection
+- `correct(handle, patch)` for natural-language correction flows
+- `profile(scope)` when the provider can synthesize stable preferences or summaries
+- `sync(source)` for connectors or background ingestion
+- `explain(queryResult)` for providers that can expose retrieval traces
+
+## What Paperclip Should Persist
+
+Paperclip should not mirror the full provider memory corpus into Postgres unless the provider is a Paperclip-managed local provider.
+
+Paperclip core should persist:
+
+- memory bindings and overrides
+- provider keys and capability metadata
+- normalized memory operation logs
+- provider record handles returned by operations when available
+- source references back to issue comments, documents, runs, and activity
+- usage and cost data
+
+For external providers, the memory payload itself can remain in the provider.
+
+## Hook Model
+
+### Automatic hooks
+
+These should be low-risk and easy to reason about:
+
+1. `pre-run hydrate`
+   Before an agent run starts, Paperclip may call `query(... intent = "agent_preamble")` using the active binding.
+
+2. `post-run capture`
+   After a run finishes, Paperclip may write a summary or transcript-derived note tied to the run.
+
+3. `issue comment / document capture`
+   When enabled on the binding, Paperclip may capture selected issue comments or issue documents as memory sources.
+
+### Explicit hooks
+
+These should be tool- or UI-driven first:
+
+- `memory.search`
+- `memory.note`
+- `memory.forget`
+- `memory.correct`
+- `memory.browse`
+
+### Not automatic in the first version
+
+- broad web crawling
+- silent import of arbitrary repo files
+- cross-company memory sharing
+- automatic destructive deletion
+- provider migration between bindings
+
+## Agent UX Rules
+
+Paperclip should give agents both automatic recall and explicit tools, with simple guidance:
+
+- use `memory.search` when the task depends on prior decisions, people, projects, or long-running context that is not in the current issue thread
+- use `memory.note` when a durable fact, preference, or decision should survive this run
+- use `memory.correct` when the user explicitly says prior context is wrong
+- rely on post-run auto-capture for ordinary session residue so agents do not have to write memory notes for every trivial exchange
+
+This keeps memory available without forcing every agent prompt to become a memory-management protocol.
+
+## Browse And Inspect Surface
+
+Paperclip needs a first-class UI for memory, otherwise providers become black boxes.
+
+The initial browse surface should support:
+
+- active binding by company and agent
+- recent memory operations
+- recent write sources
+- query results with source backlinks
+- filters by agent, issue, run, source kind, and date
+- provider usage / cost / latency summaries
+
+When a provider supports richer browsing, the plugin can add deeper views through the existing plugin UI surfaces.
+
+## Cost And Evaluation
+
+Every adapter response should be able to return usage records.
+
+Paperclip should roll up:
+
+- memory inference tokens
+- embedding tokens
+- external provider cost
+- latency
+- query count
+- write count
+
+It should also record evaluation-oriented metrics where possible:
+
+- recall hit rate
+- empty query rate
+- manual correction count
+- per-binding success / failure counts
+
+This is important because a memory system that "works" but silently burns budget is not acceptable in Paperclip.
+
+## Suggested Data Model Additions
+
+At the control-plane level, the likely new core tables are:
+
+- `memory_bindings`
+  - company-scoped key
+  - provider id / plugin id
+  - config blob
+  - enabled status
+
+- `memory_binding_targets`
+  - target type (`company`, `agent`, later `project`)
+  - target id
+  - binding id
+
+- `memory_operations`
+  - company id
+  - binding id
+  - operation type (`write`, `query`, `forget`, `browse`, `correct`)
+  - scope fields
+  - source refs
+  - usage / latency / cost
+  - success / error
+
+Provider-specific long-form state should stay in plugin state or the provider itself unless a built-in local provider needs its own schema.
+
+## Recommended First Built-In
+
+The best zero-config built-in is a local markdown-first provider with optional semantic indexing.
+
+Why:
+
+- it matches Paperclip's local-first posture
+- it is inspectable
+- it is easy to back up and debug
+- it gives the system a baseline even without external API keys
+
+The design should still treat that built-in as just another provider behind the same control-plane contract.
+
+## Rollout Phases
+
+### Phase 1: Control-plane contract
+
+- add memory binding models and API types
+- add plugin capability / registration surface for memory providers
+- add operation logging and usage reporting
+
+### Phase 2: One built-in + one plugin example
+
+- ship a local markdown-first provider
+- ship one hosted adapter example to validate the external-provider path
+
+### Phase 3: UI inspection
+
+- add company / agent memory settings
+- add a memory operation explorer
+- add source backlinks to issues and runs
+
+### Phase 4: Automatic hooks
+
+- pre-run hydrate
+- post-run capture
+- selected issue comment / document capture
+
+### Phase 5: Rich capabilities
+
+- correction flows
+- provider-native browse / graph views
+- project-level overrides if needed
+- evaluation dashboards
+
+## Open Questions
+
+- Should project overrides exist in V1 of the memory service, or should we force company default + agent override first?
+- Do we want Paperclip-managed extraction pipelines at all, or should built-ins be the only place where Paperclip owns extraction?
+- Should memory usage extend the current `cost_events` model directly, or should memory operations keep a parallel usage log and roll up into `cost_events` secondarily?
+- Do we want provider install / binding changes to require approvals for some companies?
+
+## Bottom Line
+
+The right abstraction is:
+
+- Paperclip owns memory bindings, scopes, provenance, governance, and usage reporting.
+- Providers own extraction, ranking, storage, and provider-native memory semantics.
+
+That gives Paperclip a stable "memory service" without locking the product to one memory philosophy or one vendor.
--- a/doc/plans/2026-03-17-release-automation-and-versioning.md
+++ b/doc/plans/2026-03-17-release-automation-and-versioning.md
@ -0,0 +1,488 @@
+# Release Automation and Versioning Simplification Plan
+
+## Context
+
+Paperclip's current release flow is documented in `doc/RELEASING.md` and implemented through:
+
+- `.github/workflows/release.yml`
+- `scripts/release-lib.sh`
+- `scripts/release-start.sh`
+- `scripts/release-preflight.sh`
+- `scripts/release.sh`
+- `scripts/create-github-release.sh`
+
+Today the model is:
+
+1. pick `patch`, `minor`, or `major`
+2. create `release/X.Y.Z`
+3. draft `releases/vX.Y.Z.md`
+4. publish one or more canaries from that release branch
+5. publish stable from that same branch
+6. push tag + create GitHub Release
+7. merge the release branch back to `master`
+
+That is workable, but it creates friction in exactly the places that should be cheap:
+
+- deciding `patch` vs `minor` vs `major`
+- cutting and carrying release branches
+- manually publishing canaries
+- thinking about changelog generation for canaries
+- handling npm credentials safely in a public repo
+
+The target state from this discussion is simpler:
+
+- every push to `master` publishes a canary automatically
+- stable releases are promoted deliberately from a vetted commit
+- versioning is date-driven instead of semantics-driven
+- stable publishing is secure even in a public open-source repository
+- changelog generation happens only for real stable releases
+
+## Recommendation In One Sentence
+
+Move Paperclip to semver-compatible calendar versioning, auto-publish canaries from `master`, promote stable from a chosen tested commit, and use npm trusted publishing plus GitHub environments so no long-lived npm or LLM token needs to live in Actions.
+
+## Core Decisions
+
+### 1. Use calendar versions, but keep semver syntax
+
+The repo and npm tooling still assume semver-shaped version strings in many places. That does not mean Paperclip must keep semver as a product policy. It does mean the version format should remain semver-valid.
+
+Recommended format:
+
+- stable: `YYYY.MDD.P`
+- canary: `YYYY.MDD.P-canary.N`
+
+Examples:
+
+- first stable on March 17, 2026: `2026.317.0`
+- third canary on the `2026.317.0` line: `2026.317.0-canary.2`
+
+Why this shape:
+
+- it removes `patch/minor/major` decisions
+- it is valid semver syntax
+- it stays compatible with npm, dist-tags, and existing semver validators
+- it is close to the format you actually want
+
+Important constraints:
+
+- the middle numeric slot should be `MDD`, where `M` is the month and `DD` is the zero-padded day
+- `2026.03.17` is not the format to use
+  - numeric semver identifiers do not allow leading zeroes
+- `2026.3.17.1` is not the format to use
+  - semver has three numeric components, not four
+- the practical semver-safe equivalent is `2026.317.0-canary.8`
+
+This is effectively CalVer on semver rails.
+
+### 2. Accept that CalVer changes the compatibility contract
+
+This is not semver in spirit anymore. It is semver in syntax only.
+
+That tradeoff is probably acceptable for Paperclip, but it should be explicit:
+
+- consumers no longer infer compatibility from `major/minor/patch`
+- release notes become the compatibility signal
+- downstream users should prefer exact pins or deliberate upgrades
+
+This is especially relevant for public library packages like `@paperclipai/shared`, `@paperclipai/db`, and the adapter packages.
+
+### 3. Drop release branches for normal publishing
+
+If every merge to `master` publishes a canary, the current `release/X.Y.Z` train model becomes more ceremony than value.
+
+Recommended replacement:
+
+- `master` is the only canary train
+- every push to `master` can publish a canary
+- stable is published from a chosen commit or canary tag on `master`
+
+This matches the workflow you actually want:
+
+- merge continuously
+- let npm always have a fresh canary
+- choose a known-good canary later and promote that commit to stable
+
+### 4. Promote by source ref, not by "renaming" a canary
+
+This is the most important mechanical constraint.
+
+npm can move dist-tags, but it does not let you rename an already-published version. That means:
+
+- you can move `latest` to `paperclipai@1.2.3`
+- you cannot turn `paperclipai@2026.317.0-canary.8` into `paperclipai@2026.317.0`
+
+So "promote canary to stable" really means:
+
+1. choose the commit or canary tag you trust
+2. rebuild from that exact commit
+3. publish it again with the stable version string
+
+Because of that, the stable workflow should take a source ref, not just a bump type.
+
+Recommended stable input:
+
+- `source_ref`
+  - commit SHA, or
+  - a canary git tag such as `canary/v2026.317.1-canary.8`
+
+### 5. Only stable releases get release notes, tags, and GitHub Releases
+
+Canaries should stay lightweight:
+
+- publish to npm under `canary`
+- optionally create a lightweight or annotated git tag
+- do not create GitHub Releases
+- do not require `releases/v*.md`
+- do not spend LLM tokens
+
+Stable releases should remain the public narrative surface:
+
+- git tag `v2026.317.0`
+- GitHub Release `v2026.317.0`
+- stable changelog file `releases/v2026.317.0.md`
+
+## Security Model
+
+### Recommendation
+
+Use npm trusted publishing with GitHub Actions OIDC, then disable token-based publishing access for the packages.
+
+Why:
+
+- no long-lived `NPM_TOKEN` in repo or org secrets
+- no personal npm token in Actions
+- short-lived credentials minted only for the authorized workflow
+- automatic npm provenance for public packages in public repos
+
+This is the cleanest answer to the open-repo security concern.
+
+### Concrete controls
+
+#### 1. Use one release workflow file
+
+Use one workflow filename for both canary and stable publishing:
+
+- `.github/workflows/release.yml`
+
+Why:
+
+- npm trusted publishing is configured per workflow filename
+- npm currently allows one trusted publisher configuration per package
+- GitHub environments can still provide separate canary/stable approval rules inside the same workflow
+
+#### 2. Use separate GitHub environments
+
+Recommended environments:
+
+- `npm-canary`
+- `npm-stable`
+
+Recommended policy:
+
+- `npm-canary`
+  - allowed branch: `master`
+  - no human reviewer required
+- `npm-stable`
+  - allowed branch: `master`
+  - required reviewer enabled
+  - prevent self-review enabled
+  - admin bypass disabled
+
+Stable should require an explicit second human gate even if the workflow is manually dispatched.
+
+#### 3. Lock down workflow edits
+
+Add or tighten `CODEOWNERS` coverage for:
+
+- `.github/workflows/*`
+- `scripts/release*`
+- `doc/RELEASING.md`
+
+This matters because trusted publishing authorizes a workflow file. The biggest remaining risk is not secret exfiltration from forks. It is a maintainer-approved change to the release workflow itself.
+
+#### 4. Remove traditional npm token access after OIDC works
+
+After trusted publishing is verified:
+
+- set package publishing access to require 2FA and disallow tokens
+- revoke any legacy automation tokens
+
+That eliminates the "someone stole the npm token" class of failure.
+
+### What not to do
+
+- do not put your personal Claude or npm token in GitHub Actions
+- do not run release logic from `pull_request_target`
+- do not make stable publishing depend on a repo secret if OIDC can handle it
+- do not create canary GitHub Releases
+
+## Changelog Strategy
+
+### Recommendation
+
+Generate stable changelogs only, and keep LLM-assisted changelog generation out of CI for now.
+
+Reasoning:
+
+- canaries happen too often
+- canaries do not need polished public notes
+- putting a personal Claude token into Actions is not worth the risk
+- stable release cadence is low enough that a human-in-the-loop step is acceptable
+
+Recommended stable path:
+
+1. pick a canary commit or tag
+2. run changelog generation locally from a trusted machine
+3. commit `releases/vYYYY.MDD.P.md`
+4. run stable promotion
+
+If the notes are not ready yet, a fallback is acceptable:
+
+- publish stable
+- create a minimal GitHub Release
+- update `releases/vYYYY.MDD.P.md` immediately afterward
+
+But the better steady-state is to have the stable notes committed before stable publish.
+
+### Future option
+
+If you later want CI-assisted changelog drafting, do it with:
+
+- a dedicated service account
+- a token scoped only for changelog generation
+- a manual workflow
+- a dedicated environment with required reviewers
+
+That is phase-two hardening work, not a phase-one requirement.
+
+## Proposed Future Workflow
+
+### Canary workflow
+
+Trigger:
+
+- `push` on `master`
+
+Steps:
+
+1. checkout the merged `master` commit
+2. run verification on that exact commit
+3. compute canary version for current UTC date
+4. version public packages to `YYYY.MDD.P-canary.N`
+5. publish to npm with dist-tag `canary`
+6. create a canary git tag for traceability
+
+Recommended canary tag format:
+
+- `canary/v2026.317.1-canary.4`
+
+Outputs:
+
+- npm canary published
+- git tag created
+- no GitHub Release
+- no changelog file required
+
+### Stable workflow
+
+Trigger:
+
+- `workflow_dispatch`
+
+Inputs:
+
+- `source_ref`
+- optional `stable_date`
+- `dry_run`
+
+Steps:
+
+1. checkout `source_ref`
+2. run verification on that exact commit
+3. compute the next stable patch slot for the UTC date or provided override
+4. fail if `vYYYY.MDD.P` already exists
+5. require `releases/vYYYY.MDD.P.md`
+6. version public packages to `YYYY.MDD.P`
+7. publish to npm under `latest`
+8. create git tag `vYYYY.MDD.P`
+9. push tag
+10. create GitHub Release from `releases/vYYYY.MDD.P.md`
+
+Outputs:
+
+- stable npm release
+- stable git tag
+- GitHub Release
+- clean public changelog surface
+
+## Implementation Guidance
+
+### 1. Replace bump-type version math with explicit version computation
+
+The current release scripts depend on:
+
+- `patch`
+- `minor`
+- `major`
+
+That logic should be replaced with:
+
+- `compute_canary_version_for_date`
+- `compute_stable_version_for_date`
+
+For example:
+
+- `next_stable_version(2026-03-17) -> 2026.317.0`
+- `next_canary_for_utc_date(2026-03-17) -> 2026.317.0-canary.0`
+
+### 2. Stop requiring `release/X.Y.Z`
+
+These current invariants should be removed from the happy path:
+
+- "must run from branch `release/X.Y.Z`"
+- "stable and canary for `X.Y.Z` come from the same release branch"
+- `release-start.sh`
+
+Replace them with:
+
+- canary must run from `master`
+- stable may run from a pinned `source_ref`
+
+### 3. Keep Changesets only if it stays helpful
+
+The current system uses Changesets to:
+
+- rewrite package versions
+- maintain package-level `CHANGELOG.md` files
+- publish packages
+
+With CalVer, Changesets may still be useful for publish orchestration, but it should no longer own version selection.
+
+Recommended implementation order:
+
+1. keep `changeset publish` if it works with explicitly-set versions
+2. replace version computation with a small explicit versioning script
+3. if Changesets keeps fighting the model, remove it from release publishing entirely
+
+Paperclip's release problem is now "publish the whole fixed package set at one explicit version", not "derive the next semantic bump from human intent".
+
+### 4. Add a dedicated versioning script
+
+Recommended new script:
+
+- `scripts/set-release-version.mjs`
+
+Responsibilities:
+
+- set the version in all public publishable packages
+- update any internal exact-version references needed for publishing
+- update CLI version strings
+- avoid broad string replacement across unrelated files
+
+This is safer than keeping a bump-oriented changeset flow and then forcing it into a date-based scheme.
+
+### 5. Keep rollback based on dist-tags
+
+`rollback-latest.sh` should stay, but it should stop assuming a semver meaning beyond syntax.
+
+It should continue to:
+
+- repoint `latest` to a prior stable version
+- never unpublish
+
+## Tradeoffs and Risks
+
+### 1. The stable patch slot is now part of the version contract
+
+With `YYYY.MDD.P`, same-day hotfixes are supported, but the stable patch slot is now part of the visible version format.
+
+That is the right tradeoff because:
+
+1. npm still gets semver-valid versions
+2. same-day hotfixes stay possible
+3. chronological ordering still works as long as the day is zero-padded inside `MDD`
+
+### 2. Public package consumers lose semver intent signaling
+
+This is the main downside of CalVer.
+
+If that becomes a problem, one alternative is:
+
+- use CalVer for the CLI package only
+- keep semver for library packages
+
+That is more complex operationally, so I would not start there unless package consumers actually need it.
+
+### 3. Auto-canary means more publish traffic
+
+Publishing on every `master` merge means:
+
+- more npm versions
+- more git tags
+- more registry noise
+
+That is acceptable if canaries stay clearly separate:
+
+- npm dist-tag `canary`
+- no GitHub Release
+- no external announcement
+
+## Rollout Plan
+
+### Phase 1: Security foundation
+
+1. Create `release.yml`
+2. Configure npm trusted publishers for all public packages
+3. Create `npm-canary` and `npm-stable` environments
+4. Add `CODEOWNERS` protection for release files
+5. Verify OIDC publishing works
+6. Disable token-based publishing access and revoke old tokens
+
+### Phase 2: Canary automation
+
+1. Add canary workflow on `push` to `master`
+2. Add explicit calendar-version computation
+3. Add canary git tagging
+4. Remove changelog requirement from canaries
+5. Update `doc/RELEASING.md`
+
+### Phase 3: Stable promotion
+
+1. Add manual stable workflow with `source_ref`
+2. Require stable notes file
+3. Publish stable + tag + GitHub Release
+4. Update rollback docs and scripts
+5. Retire release-branch assumptions
+
+### Phase 4: Cleanup
+
+1. Remove `release-start.sh` from the primary path
+2. Remove `patch/minor/major` from maintainer docs
+3. Decide whether to keep or remove Changesets from publishing
+4. Document the CalVer compatibility contract publicly
+
+## Concrete Recommendation
+
+Paperclip should adopt this model:
+
+- stable versions: `YYYY.MDD.P`
+- canary versions: `YYYY.MDD.P-canary.N`
+- canaries auto-published on every push to `master`
+- stables manually promoted from a chosen tested commit or canary tag
+- no release branches in the default path
+- no canary changelog files
+- no canary GitHub Releases
+- no Claude token in GitHub Actions
+- no npm automation token in GitHub Actions
+- npm trusted publishing plus GitHub environments for release security
+
+That gets rid of the annoying part of semver without fighting npm, makes canaries cheap, keeps stables deliberate, and materially improves the security posture of the public repository.
+
+## External References
+
+- npm trusted publishing: https://docs.npmjs.com/trusted-publishers/
+- npm dist-tags: https://docs.npmjs.com/adding-dist-tags-to-packages/
+- npm semantic versioning guidance: https://docs.npmjs.com/about-semantic-versioning/
+- GitHub environments and deployment protection rules: https://docs.github.com/en/actions/how-tos/deploy/configure-and-manage-deployments/manage-environments
+- GitHub secrets behavior for forks: https://docs.github.com/en/actions/how-tos/write-workflows/choose-what-workflows-do/use-secrets