mirror of
https://github.com/alkimake/paperclip.git
synced 2026-06-14 01:50:39 +09:00
[codex] Document terminal bench dispatch config (#4961)
## Thinking Path > - Paperclip agents rely on skills for repeatable operating procedures > - The Terminal-Bench loop skill needs to preserve enough dispatch configuration to reproduce real heartbeat behavior > - A bare benchmark command can create unassigned work with no heartbeat-enabled agent, which is a harness setup failure rather than product evidence > - The Paperclip heartbeat skill also needs to keep escalation biased toward agent-owned follow-through > - This pull request documents dispatch runner config requirements and strengthens the agent follow-through rule > - The benefit is fewer misleading benchmark loops and clearer agent operating guidance ## What Changed - Documented `PAPERCLIP_HARBOR_RUNNER_CONFIG` / runner dispatch config as required Terminal-Bench loop input. - Updated the Terminal-Bench loop smoke check to require the dispatch config mention. - Added stronger Paperclip skill guidance to avoid asking humans for work an agent can perform. ## Verification - `pnpm smoke:terminal-bench-loop-skill` ## Risks - Low risk: documentation and smoke expectation changes only. The stricter smoke assertion is intentional so future edits do not drop the dispatch config requirement. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5 coding agent, tool use and local command execution. Exact context window was not exposed in the runtime. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
This commit is contained in:
parent
d7719423e9
commit
685ee84e4a
3 changed files with 18 additions and 2 deletions
|
|
@ -91,6 +91,7 @@ async function assertLocalSkillPackage() {
|
|||
"diagnosis",
|
||||
"blockedByIssueIds",
|
||||
"PAPERCLIPAI_CMD",
|
||||
"PAPERCLIP_HARBOR_RUNNER_CONFIG",
|
||||
]) {
|
||||
assert(markdown.includes(expected), `Skill smoke expected ${skillPath} to mention ${expected}`);
|
||||
}
|
||||
|
|
@ -194,6 +195,8 @@ async function main() {
|
|||
`- Manifest: ${artifactRoot}/manifest.json`,
|
||||
`- Results JSONL: ${artifactRoot}/results.jsonl`,
|
||||
`- Harbor raw job folder: ${artifactRoot}/harbor/raw-job`,
|
||||
"- Dispatch config: PAPERCLIP_HARBOR_RUNNER_CONFIG=<omitted - harness/setup no-dispatch smoke>",
|
||||
"- Heartbeat-enabled agents: 0 (harness/setup no-dispatch; not a product signal)",
|
||||
"",
|
||||
"No benchmark process, Harbor job, model call, or provider call was started.",
|
||||
].join("\n"),
|
||||
|
|
@ -309,6 +312,7 @@ async function main() {
|
|||
`Expected iteration issue to be in_review, got ${verifiedIteration.status}`,
|
||||
);
|
||||
assert(verifiedRunDoc.body.includes(`${artifactRoot}/results.jsonl`), "Expected run doc to include mocked results path");
|
||||
assert(verifiedRunDoc.body.includes("PAPERCLIP_HARBOR_RUNNER_CONFIG"), "Expected run doc to record dispatch config");
|
||||
assert(
|
||||
verifiedDiagnosisDoc.body.includes("Exact stop point") && verifiedDiagnosisDoc.body.includes("Next-action owner"),
|
||||
"Expected diagnosis doc to include exact stop point and next-action owner",
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue