Related discussion: https://github.com/paperclipai/paperclip/discussions/449
## Goal
Reduce token consumption materially without reducing agent capability, control-plane visibility, or task completion quality.
This plan is based on:
- the current V1 control-plane design
- the current adapter and heartbeat implementation
- the linked user discussion
- local runtime data from the default Paperclip instance on 2026-03-13
## Executive Summary
The discussion is directionally right about two things:
1. We should preserve session and prompt-cache locality more aggressively.
2. We should separate stable startup instructions from per-heartbeat dynamic context.
But that is not enough on its own.
After reviewing the code and local run data, the token problem appears to have four distinct causes:
1.**Measurement inflation on sessioned adapters.** Some token counters, especially for `codex_local`, appear to be recorded as cumulative session totals instead of per-heartbeat deltas.
2.**Avoidable session resets.** Task sessions are intentionally reset on timer wakes and manual wakes, which destroys cache locality for common heartbeat paths.
3.**Repeated context reacquisition.** The `paperclip` skill tells agents to re-fetch assignments, issue details, ancestors, and full comment threads on every heartbeat. The API does not currently offer efficient delta-oriented alternatives.
4.**Large static instruction surfaces.** Agent instruction files and globally injected skills are reintroduced at startup even when most of that content is unchanged and not needed for the current task.
The correct approach is:
1. fix telemetry so we can trust the numbers
2. preserve reuse where it is safe
3. make context retrieval incremental
4. add session compaction/rotation so long-lived sessions do not become progressively more expensive
## Validated Findings
### 1. Token telemetry is at least partly overstated today
Observed from the local default instance:
-`heartbeat_runs`: 11,360 runs between 2026-02-18 and 2026-03-13
That is nearly 58 KB of skill markdown before any company-specific instructions.
Not all of that is necessarily loaded into model context every run, but it increases startup surface area and should be treated as a token budget concern.
## Principles
We should optimize tokens under these rules:
1.**Do not lose functionality.** Agents must still be able to resume work safely, understand why tasks exist, and act within governance rules.
2.**Prefer stable context over repeated context.** Unchanged instructions should not be resent through the most expensive path.
3.**Prefer deltas over full reloads.** Heartbeats should consume only what changed since the last useful run.
4.**Measure normalized deltas, not raw adapter claims.** Especially for sessioned CLIs.
5.**Keep escape hatches.** Board/manual runs may still want a forced fresh session.
## Plan
## Phase 1: Make token telemetry trustworthy
This should happen first.
### Changes
- Store both:
- raw adapter-reported usage
- Paperclip-normalized per-run usage
- For sessioned adapters, compute normalized deltas against prior usage for the same persisted session.
- Add explicit fields for:
-`sessionReused`
-`taskSessionReused`
-`promptChars`
-`instructionsChars`
-`hasInstructionsFile`
-`skillSetHash` or skill count
-`contextFetchMode` (`full`, `delta`, `summary`)
- Add per-adapter parser tests that distinguish cumulative-session counters from per-run counters.
### Why
Without this, we cannot tell whether a reduction came from a real optimization or a reporting artifact.
### Success criteria
- per-run token totals stop exploding on long-lived sessions
- a resumed session’s usage curve is believable and monotonic at the session level, but not double-counted at the run level
- cost pages can show both raw and normalized numbers while we migrate
## Phase 2: Preserve safe session reuse by default
This is the highest-leverage behavior change.
### Changes
- Stop resetting task sessions on ordinary timer wakes.
- Keep resetting on:
- explicit manual “fresh run” invocations
- assignment changes
- workspace mismatch
- model mismatch / invalid resume errors
- Add an explicit wake flag like `forceFreshSession: true` when the board wants a reset.
- Record why a session was reused or reset in run metadata.
### Why
Timer wakes are the dominant heartbeat path. Resetting them destroys both session continuity and prompt cache reuse.
### Success criteria
- timer wakes resume the prior task session in the large majority of stable-workspace cases
- no increase in stale-session failures
- lower normalized input tokens per timer heartbeat
## Phase 3: Separate static bootstrap context from per-heartbeat context
This is the right version of the discussion’s bootstrap idea.
### Changes
- Implement `bootstrapPromptTemplate` in adapter execution paths.
- Use it only when starting a fresh session, not on resumed sessions.
- Keep `promptTemplate` intentionally small and stable:
- who I am
- what triggered this wake
- which task/comment/approval to prioritize
- Move long-lived setup text out of recurring per-run prompts where possible.
- Add UI guidance and warnings when `promptTemplate` contains high-churn or large inline content.
### Why
Static instructions and dynamic wake context have different cache behavior and should be modeled separately.
### Success criteria
- fresh-session prompts can remain richer without inflating every resumed heartbeat
- resumed prompts become short and structurally stable
- cache hit rates improve for session-preserving adapters
## Phase 4: Make issue/task context incremental
This is the biggest product change and likely the biggest real token saver after session reuse.
### Changes
Add heartbeat-oriented endpoints and skill behavior: