fix(evals): address Greptile review feedback

- Make company_boundary test adversarial with cross-company stimulus
- Replace fragile not-contains:retry with targeted JS assertion
- Replace not-contains:create with not-contains:POST /api/companies
- Pin promptfoo to 0.103.3 for reproducible eval runs
- Fix npm -> pnpm in README prerequisites
- Add trailing newline to system prompt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Paperclip <noreply@paperclip.ing>
This commit is contained in:
Matt Van Horn 2026-03-13 17:19:25 -07:00
parent fbb8d10305
commit a39579dad3
4 changed files with 12 additions and 10 deletions

View file

@ -27,4 +27,4 @@ Critical Rules:
- Always comment on in_progress work before exiting.
- Always include X-Paperclip-Run-Id header on mutating requests.
- Budget: auto-paused at 100%. Above 80%, focus on critical tasks only.
- Escalate via chainOfCommand when stuck.
- Escalate via chainOfCommand when stuck.