mirror of
https://github.com/aaddrick/claude-desktop-debian.git
synced 2026-05-17 00:26:21 +03:00
docs(testing): session 2 plan/inventory + rotate session 3 prompt
- runner-implementation-plan.md: new "Status (post-execution)" sub- section for session 2 listing the 10 new specs and the four reclassification notes (S28 → Tier 1, T38 framing, T23 tool choice, S19 honest-stub note). Session 1 sub-section preserved verbatim below for comparison. - README.md: 50-spec inventory (was 40), new T-rows (T10, T16, T23, T25, T26, T38) and S-rows (S10, S19, S25, S28) interleaved into the existing tables. Substrate-primitives paragraph extended with dbus-monitor, mock-then-call, ipcMain registry introspection, safeStorage round-trip, extraEnv precedence. - runner-implementation-followup-prompt.md: rewritten for session 3 — deferred items (T31, T32, S06, S11, S14), Tier 3 → Tier 2 reframes (T22, T35, T37), asar fingerprint cleanups (T24, T30, T33), the focus-shifter primitive build, and the mock-then-call extension for T24 as an alternative to its asar form. Includes the "known mechanism-recipe table" cumulating sessions 1+2. - runner-implementation-prompt.md: deleted (session 1's prompt, superseded by the followup that's been the rolling document since session 1 ended). Co-Authored-By: Claude <claude@anthropic.com>
This commit is contained in:
521
docs/testing/runner-implementation-followup-prompt.md
Normal file
521
docs/testing/runner-implementation-followup-prompt.md
Normal file
@@ -0,0 +1,521 @@
|
||||
# test-harness runner implementation — session 3 prompt
|
||||
|
||||
This file is meant to be **copied verbatim into a fresh Claude Code
|
||||
session** as the initial user message. Don't paraphrase it; the
|
||||
orchestration depends on the exact directives below.
|
||||
|
||||
You're picking up after a runner-implementation session that landed 10
|
||||
new specs (5 Tier 2 + 4 Tier 2-reframes + 1 Tier 1 reclass), lifting
|
||||
harness coverage from 40/76 (53%) to 50/76 (66%). Three commits on
|
||||
`docs/compat-matrix`:
|
||||
|
||||
- `XXX` — `test(harness): session 2 runners + lib/claudeai mock helper`
|
||||
(10 new spec files; `installShowItemInFolderMock` added to
|
||||
`lib/claudeai.ts` mirroring the `installOpenDialogMock` pattern;
|
||||
README inventory + plan-doc status section updated).
|
||||
|
||||
(Substitute the actual SHA after committing — the user reviews and
|
||||
commits at the end of every session.)
|
||||
|
||||
The plan doc at
|
||||
[`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
|
||||
captures the tier classification and execution-time reclassifications.
|
||||
Its "Status (post-execution)" section is the source of truth for what's
|
||||
done and what's deferred — read **session 2** then **session 1**
|
||||
sub-sections.
|
||||
|
||||
This session is a continuation, not a restart. Start by reading the
|
||||
plan doc's status sections.
|
||||
|
||||
### Big new findings from session 2
|
||||
|
||||
1. **Mock-then-call beats invoke-then-cleanup for Tier 2 reframes of
|
||||
side-effecting Electron APIs.** T17's existing
|
||||
`installOpenDialogMock` pattern was extended to T25 via a new
|
||||
`installShowItemInFolderMock` in `lib/claudeai.ts`. Net: no host
|
||||
file-manager pop-up during the run, AND the assertion strengthens
|
||||
from "didn't throw" to "the egress was reached + the path arg
|
||||
flowed through verbatim". Apply this pattern to any future
|
||||
`shell.*` / `dialog.*` Tier 2 reframes (T24 `shell.openExternal`
|
||||
would mock cleanly the same way).
|
||||
|
||||
2. **`gdbus monitor --dest <name>` only sees signals OWNED BY that
|
||||
destination, not method calls TO it.** T23 had to switch from the
|
||||
plan's gdbus suggestion to `dbus-monitor` (eavesdrop match rule)
|
||||
to observe `org.freedesktop.Notifications.Notify` calls from
|
||||
Electron. If T27 / T22 / S24 ever ship Tier 2 reframes that need
|
||||
to observe method calls on a service, use `dbus-monitor`.
|
||||
|
||||
3. **`ipcMain._invokeHandlers` channel naming carries a build-stable
|
||||
UUID prefix:** `$eipc_message$_<UUID>_$_claude.web_$_<name>`. T38
|
||||
anchors on the `_$_<name>` suffix to survive UUID rotation; the
|
||||
prefix is captured as diagnostic. Useful precedent for any future
|
||||
IPC-introspection probes.
|
||||
|
||||
4. **Closure-local minified helpers are NOT reachable from
|
||||
globalThis.** S28's plan called for inspector-eval against
|
||||
`Sbn()`, but `Sbn` is a closure-local — couldn't be invoked. S28
|
||||
reclassified to Tier 1 (asar fingerprint of the classifier
|
||||
expression). For any future "Tier 2 reframe via inspector-eval
|
||||
against minified helper X" entry: confirm reachability before
|
||||
classifying.
|
||||
|
||||
5. **`safeStorage` on Linux uses random IVs.** Only decrypted
|
||||
plaintext is comparable across encrypt calls; ciphertext bytes
|
||||
are not deterministic. S25 compares plaintexts.
|
||||
|
||||
6. **`extraEnv` precedence.** `lib/electron.ts:317-323` spreads in
|
||||
order: `process.env`, `LAUNCHER_INJECTED_ENV`, `isolation?.env`,
|
||||
`waylandEnv`, then `opts.extraEnv`, then `CI: '1'`. Override
|
||||
wins. S19 leans on this; load-bearing for future tests that
|
||||
need to override isolation defaults.
|
||||
|
||||
### Authoritative reference
|
||||
|
||||
Read these in order before fanning out:
|
||||
|
||||
- [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
|
||||
— tier classification + status section. Read both **session 2**
|
||||
and **session 1** "Status (post-execution)" sub-sections. The
|
||||
Tier-3 list (line ~342) is the candidate pool for further
|
||||
reframes.
|
||||
- [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
|
||||
— runner conventions, the now-50-spec inventory, primitives in
|
||||
`lib/`, isolation defaults, the CDP-gate workaround, the
|
||||
`seedFromHost` reference.
|
||||
- [`docs/testing/cases/README.md`](cases/README.md) — case-doc
|
||||
structure and the four anchor scopes.
|
||||
- [`tools/test-harness/src/lib/`](../../tools/test-harness/src/lib/)
|
||||
— the existing primitives. Notable additions since session 2:
|
||||
- `claudeai.ts` — `installShowItemInFolderMock` /
|
||||
`getShowItemInFolderCalls` (mirrors `installOpenDialogMock`).
|
||||
If 3+ tests start using mock-then-call, consider extracting to
|
||||
`lib/electron-mocks.ts` — but don't pre-extract.
|
||||
- [`tools/test-harness/src/runners/`](../../tools/test-harness/src/runners/)
|
||||
— every existing spec is a template. Notable session 2 templates:
|
||||
- `T16_code_tab_loads.spec.ts` / `T26_routines_page_renders.spec.ts`
|
||||
— seedFromHost + post-login renderer-side AX nav. T16 uses the
|
||||
existing `CodeTab.activate()`; T26 inlines a similar AX walker
|
||||
for the sidebar. Pattern for any further "click an AX-tree
|
||||
button after login" test.
|
||||
- `T25_show_item_in_folder_no_throw.spec.ts` — mock-then-call
|
||||
pattern (mirrors T17). Use this shape for any future Tier 2
|
||||
reframe of a side-effecting Electron API.
|
||||
- `T38_open_in_editor_handler_registered.spec.ts` — IPC handler
|
||||
registry introspection via `ipcMain._invokeHandlers`. Pattern
|
||||
for any "is this handler wired up" check.
|
||||
- `T23_notification_reaches_dbus.spec.ts` — dbus-monitor
|
||||
subprocess + inspector-fired notification + buffer scan.
|
||||
- `T10_cowork_daemon_respawn.spec.ts` — H04 extension: spawn,
|
||||
SIGKILL, poll for new pid. Pattern for any "service auto-respawn
|
||||
contract" test.
|
||||
- `S25_safestorage_token_persists.spec.ts` — two-launch with
|
||||
shared isolation handle + safeStorage round-trip via tmpfile.
|
||||
- `S28_worktree_permission_classifier.spec.ts` — single-regex
|
||||
asar fingerprint for a multi-string-OR classifier expression.
|
||||
- [`docs/testing/cases/*.md`](cases/) — the spec each runner
|
||||
asserts. The **Code anchors:** field tells you exactly where
|
||||
upstream implements the feature.
|
||||
|
||||
### Tests in scope this session
|
||||
|
||||
Five categories, in priority order:
|
||||
|
||||
| # | Tests | Source files | Notes |
|
||||
|---|---|---|---|
|
||||
| **A** Deferred from session 2 | T31, T32, S06, S11, S14 | `code-tab-workflow.md` (T31, T32), `shortcuts-and-input.md` (S06, S11, S14) | T31/T32 need a Code-tab session OPEN; S06 needs Wayland row; S11/S14 need a new focus-shifter primitive |
|
||||
| **B** Tier 3 → Tier 2 reframes (read-only) | T22, T35, T37 | `code-tab-workflow.md` (T22), `extensibility.md` (T35, T37) | Each can ship as a *reads-from-disk-or-IPC-registry* probe without writing to the user's account |
|
||||
| **C** Asar fingerprint cleanups | T24, T30, T33 | `code-tab-handoff.md` (T24), `code-tab-workflow.md` (T30), `extensibility.md` (T33) | Each has a load-bearing string set in `index.js` that pins the wiring without needing a launch |
|
||||
| **D** New primitive — focus-shifter | (unblocks A's S11/S14) | `lib/input.ts` | xdotool / ydotool focus-stealing helper. Build only if S11/S14 worth shipping this session |
|
||||
| **E** Mock-then-call extension | T24 (mock form) | `code-tab-handoff.md` (T24) | Mirror of T25's pattern but for `shell.openExternal` — handler reaches the egress with the right URL |
|
||||
|
||||
Realistic ceiling: **~6-8 new specs** this session. Don't try all
|
||||
13 — Categories A and B are heavier than session 2's mix because
|
||||
they need either a Code-tab session opened OR a new primitive built
|
||||
first.
|
||||
|
||||
### Detailed scope per category
|
||||
|
||||
#### Category A — deferred items (5)
|
||||
|
||||
- **T31 — Side chat opens.** Needs: `seedFromHost` + Code-tab
|
||||
session OPEN (env pill → Local → choose folder → wait for session
|
||||
load). After session loads, send `Ctrl+;` via ydotool OR find the
|
||||
IPC handler `startSideChat` in `ipcMain._invokeHandlers` (T38
|
||||
pattern) and assert it's registered + invokable. The lighter form
|
||||
is the IPC-registry probe; the heavier form is full
|
||||
click-chain-into-side-chat.
|
||||
- **T32 — Slash command menu.** Needs: same Code-tab session OPEN
|
||||
preamble as T31. Then trigger `/` in the prompt textarea and
|
||||
assert the slash menu renders (AX-tree query for menuitem* nodes
|
||||
in the prompt area). Heavier than T31 because the slash menu is
|
||||
rendered server-side by claude.ai's bundle.
|
||||
- **S06 — URL handler segfault on native Wayland.** Needs:
|
||||
`CLAUDE_HARNESS_USE_WAYLAND=1` row + `coredumpctl info
|
||||
claude-desktop` observation after firing
|
||||
`xdg-open 'claude://chat/new'`. Skip cleanly if not on a
|
||||
Wayland row.
|
||||
- **S11 / S14 — focus-shifter delivery.** Needs: `lib/input.ts`
|
||||
with `focusOtherWindow()` (xdotool on X11; skip on Wayland or
|
||||
use compositor-specific). Then S11 / S14 launch app, focus
|
||||
another window, fire shortcut, assert popup appears. Build the
|
||||
primitive in one PR (Category D), then both runners in a second
|
||||
PR.
|
||||
|
||||
#### Category B — Tier 3 → Tier 2 reframes (3)
|
||||
|
||||
These each have a slice that doesn't write to the user's real
|
||||
account:
|
||||
|
||||
- **T22 — PR monitoring (read-only half).** The Tier 3 form opens a
|
||||
PR; the Tier 2 reframe is "after `seedFromHost`, IPC handler
|
||||
`getPrChecks` is registered + the `gh CLI not found in PATH`
|
||||
string is in the bundle". The handler-registered probe is the
|
||||
shippable form; the missing-`gh` warning string is a static
|
||||
fingerprint. Both ship as one runner.
|
||||
- **T35 — MCP server config picked up.** Reframe: place a fixture
|
||||
`claude_desktop_config.json` under the isolation's configDir
|
||||
(no host config touch needed — fresh isolation), then via
|
||||
inspector eval, read whatever main-process state holds the
|
||||
parsed MCP server list. Anchor on a known path under
|
||||
`${configDir}/Claude/`.
|
||||
- **T37 — `CLAUDE.md` memory loads.** Reframe: place a fixture
|
||||
`~/.claude/CLAUDE.md` (or under `CLAUDE_CONFIG_DIR/CLAUDE.md`
|
||||
with extraEnv override — see S19's pattern), then via inspector
|
||||
eval read the loaded memory state. Anchor needs to come from
|
||||
case-doc Code anchors.
|
||||
|
||||
#### Category C — asar fingerprint cleanups (3)
|
||||
|
||||
Each is Tier 1 / no launch:
|
||||
|
||||
- **T24 — Open in external editor (asar fingerprint).** The full
|
||||
click-chain T24 is Tier 3. The fingerprint half: assert `Mtt`
|
||||
registry is in `index.js` with the editor scheme strings
|
||||
(`vscode://`, `cursor://`, `zed://`, `windsurf://`).
|
||||
- **T30 — Auto-archive on PR merge (cadence constants).** Static
|
||||
fingerprint of the sweep cadence — assert `300_000` (5 min)
|
||||
and `3_600_000` (1 h) appear near the auto-archive code.
|
||||
- **T33 — Plugin browser (IPC handler registered).** Same shape
|
||||
as T38 — assert `listMarketplaces` IPC handler is registered.
|
||||
|
||||
#### Category D — primitive build
|
||||
|
||||
- **`lib/input.ts:focusOtherWindow()`.** xdotool on X11
|
||||
(`xdotool search --name '<test-marker>' windowfocus`); on
|
||||
Wayland skip cleanly (no portable focus injection). Used by
|
||||
S11 / S14. Don't build unless those are in scope this session.
|
||||
|
||||
#### Category E — mock-then-call extension
|
||||
|
||||
- **T24 (mock form, alternative to Category C).** Mock
|
||||
`shell.openExternal` via a new `installOpenExternalMock` in
|
||||
`lib/claudeai.ts` (mirror of `installShowItemInFolderMock`),
|
||||
then `inspector.evalInMain` calls
|
||||
`shell.openExternal('vscode://file/tmp/test')` and assert
|
||||
the recorded call list contains the URL. Strictly stronger
|
||||
than the Category C fingerprint form. **Pick C OR E for T24
|
||||
— not both.**
|
||||
|
||||
### Why this iteration
|
||||
|
||||
The harness is at 50/76 coverage and every release tag now
|
||||
exercises the smoke-set + a chunk of critical surfaces
|
||||
automatically. Remaining work clusters in three pockets:
|
||||
|
||||
- **Code-tab cluster (T15-T39, mostly login-walled).** Session 2
|
||||
unblocked the *render-only* half via `seedFromHost`; session 3
|
||||
should push into the *open-a-session* half (T31, T32) and the
|
||||
read-only-Tier-3-reframes (T22, T35, T37).
|
||||
- **Wayland-specific tests (S06).** Need a Wayland row + the
|
||||
harness's `CLAUDE_HARNESS_USE_WAYLAND=1` switch.
|
||||
- **Focus-shift-dependent tests (S11, S14).** Need
|
||||
`lib/input.ts:focusOtherWindow()` built first.
|
||||
|
||||
After this session, future sessions can focus on the genuinely
|
||||
heavy Tier 3 work (destructive-write login tests; multi-launch
|
||||
state) with a clearer cost model.
|
||||
|
||||
### Known mechanism-recipe table (session 1 + session 2)
|
||||
|
||||
| Pattern | Use when | Worked example |
|
||||
|---|---|---|
|
||||
| `createIsolation({ seedFromHost: true })` | spec needs a signed-in renderer; read-only | T07, T16, T26 |
|
||||
| `isolation: null` + pre-launch `killHostClaude()` | spec needs SingletonLock collision (delivery probes) | T05 |
|
||||
| Default isolation | most other tests | T01, T03, T04, S29 |
|
||||
| `isolation: <handle>` (shared across launches) | multi-launch persistent state | S35, S25 |
|
||||
| `MainWindow.setState('close')` | exercise the wrapper close-interceptor | T08 |
|
||||
| Mock-then-call for `shell.*` / `dialog.*` | Tier 2 reframe of side-effecting API | T17, T25 |
|
||||
| `ipcMain._invokeHandlers` registry probe | "is this IPC handler wired up" | T38 |
|
||||
| `dbus-monitor` subprocess | observe DBus method calls TO a destination | T23 |
|
||||
| Asar single-regex multi-string-OR fingerprint | classifier-style code that combines several strings | S28 |
|
||||
|
||||
### Constraints to respect (don't violate)
|
||||
|
||||
These are unchanged from sessions 1 and 2 and still load-bearing:
|
||||
|
||||
- **Default isolation** unless the spec needs otherwise. Never
|
||||
write to `~/.config/Claude` without explicit gating
|
||||
(`CLAUDE_TEST_USE_HOST_CONFIG=1` opt-out, OR `seedFromHost: true`
|
||||
with read-only-then-discard semantics, OR an explicit comment
|
||||
documenting why).
|
||||
- **CDP auth gate is alive** — runtime SIGUSR1 attach via
|
||||
`app.attachInspector()`, never Playwright's `_electron.launch()`
|
||||
or `chromium.connectOverCDP()`.
|
||||
- **BrowserWindow Proxy gotcha** — use `webContents.getAllWebContents()`
|
||||
not `BrowserWindow.getAllWindows()`. Constructor-level wraps
|
||||
don't work; use prototype-method hooks.
|
||||
- **`skipUnlessRow()` always first.** First line of every `test()`
|
||||
body when the test is row-gated.
|
||||
- **No fixed sleeps.** `retryUntil` from `lib/retry.ts`, or
|
||||
Playwright auto-wait. Fixed `sleep(N)` is a smell.
|
||||
- **Diagnostics on every run.** `testInfo.attach()` the artefacts
|
||||
(launcher log, --doctor output, frame extents, click attempts,
|
||||
AX-tree snapshot). Captured as JSON dumps for multi-state tests.
|
||||
- **Tag with annotations.** `severity:` and `surface:` on every
|
||||
test so JUnit carries them through to matrix-regen.
|
||||
- **Tabs in TS, ~80-char wrap as the existing files do.** Match
|
||||
surrounding style.
|
||||
- **Don't break existing runners.** `npm run typecheck` must stay
|
||||
clean. H01-H05 are the canaries; `npm test` must still pass them
|
||||
after every commit.
|
||||
- **For mock-then-call: leading comment must document why mock
|
||||
beats invoke** (T25's leading comment is the worked example —
|
||||
three short paragraphs).
|
||||
|
||||
### Phases
|
||||
|
||||
#### Phase 0 — calibration
|
||||
|
||||
1. `cd tools/test-harness && npm run typecheck` — should pass.
|
||||
2. Read the plan doc's "Status (post-execution)" session 2 section,
|
||||
then read T25's leading comment (the mock-then-call pattern's
|
||||
worked-example doc) and T16/T26 (the seedFromHost-then-AX-nav
|
||||
pattern). Confirm you understand both.
|
||||
3. Pick one Category B candidate (suggest T22 — read-only half) and
|
||||
sketch the runner shape mentally. Don't write it yet — confirm
|
||||
you can plan from the spec.
|
||||
|
||||
If Phase 0 surfaces a problem (typecheck failing, primitives
|
||||
unclear, patterns not understood), stop and report. Don't fan out.
|
||||
|
||||
#### Phase 1 — fan-out batch
|
||||
|
||||
Spawn parallel subagents (cap at 6 in flight) for the highest-
|
||||
confidence candidates first.
|
||||
|
||||
**Suggested initial batch (4-5 specs):**
|
||||
|
||||
- **B / T22 — PR monitoring read-only half.** seedFromHost +
|
||||
`ipcMain._invokeHandlers` for `getPrChecks` + asar fingerprint
|
||||
for the missing-`gh` warning string.
|
||||
- **C / T24 (asar fingerprint OR mock form, pick one).** If you
|
||||
want stronger coverage, go mock form (Category E shape — mirror
|
||||
of T25's `installOpenExternalMock` helper). If you want a quick
|
||||
Tier 1, go asar fingerprint (`Mtt` registry + scheme strings).
|
||||
- **C / T30 — Auto-archive cadence constants.** Pure asar probe.
|
||||
- **C / T33 — Plugin browser handler registered.** T38 pattern.
|
||||
- **B / T35 — MCP server config picked up.** Fixture under
|
||||
isolation configDir + inspector eval.
|
||||
|
||||
If those land cleanly, dispatch a second batch:
|
||||
|
||||
- **A / T31 — Side chat opens (handler-registered shape).**
|
||||
seedFromHost + `ipcMain._invokeHandlers` for `startSideChat`
|
||||
/ `sendSideChatMessage` / `stopSideChat`. The lighter probe.
|
||||
- **A / T32 — Slash command menu (asar fingerprint).** The full
|
||||
AX-tree form needs a Code-tab session open AND server-side
|
||||
rendered menu — heavy. The fingerprint form: `getSupportedCommands`
|
||||
+ `slashCommands` schema present in `index.js`.
|
||||
- **A / S11 / S14 (only if Category D primitive is built first).**
|
||||
Build `lib/input.ts:focusOtherWindow()` in PR 1; ship S11/S14
|
||||
in PR 2.
|
||||
- **B / T37 — CLAUDE.md memory loads.** Fixture file + inspector
|
||||
eval against the loaded memory state.
|
||||
|
||||
#### Per-subagent prompt shape
|
||||
|
||||
```
|
||||
You're implementing ONE test-harness runner for <TEST-ID> in
|
||||
docs/testing/cases/<FILE>.md.
|
||||
|
||||
Read in order:
|
||||
- docs/testing/cases/<FILE>.md (focus on <TEST-ID>'s Code anchors)
|
||||
- tools/test-harness/README.md (conventions; status section names
|
||||
the most-recent-template that fits)
|
||||
- tools/test-harness/src/runners/<closest-template>.spec.ts
|
||||
- tools/test-harness/src/lib/ (the primitives you'll reuse)
|
||||
- CLAUDE.md (project conventions)
|
||||
|
||||
Write tools/test-harness/src/runners/<TEST-ID>_short_name.spec.ts.
|
||||
|
||||
[per-test specifics: pattern (seedFromHost / mock-then-call /
|
||||
ipcMain._invokeHandlers / asar fingerprint / shared isolation),
|
||||
assertion shape, skip rules, key constraint warnings]
|
||||
|
||||
Constraints:
|
||||
- Tabs, ~80-char wrap.
|
||||
- Use lib/* primitives; don't reinvent.
|
||||
- testInfo.attach() the diagnostics from the spec's "Diagnostics
|
||||
on failure" block.
|
||||
- Tag with severity + surface annotations.
|
||||
- No fixed sleeps. retryUntil or Playwright auto-wait.
|
||||
- npm run typecheck must stay clean after your edits.
|
||||
- Don't commit. The user reviews and commits.
|
||||
|
||||
If the test isn't reasonable to implement (anchors don't resolve
|
||||
to anything assertable, the test depends on state you can't
|
||||
construct, the existing primitives don't cover the surface), DO
|
||||
NOT write a stub. Report under Open questions and stop. Sessions
|
||||
1 and 2 had cumulative ~6 "stop and report" outcomes that were
|
||||
the right call (S20 deferral, T05 reshape, T07 needs seedFromHost,
|
||||
T08 needs setState('close'), S28 reclassification, T38 framing).
|
||||
|
||||
Report shape (~150 words):
|
||||
## <TEST-ID> runner
|
||||
|
||||
- File written: tools/test-harness/src/runners/<filename>.spec.ts
|
||||
- Layer: file probe | argv probe | L1 | L2 (xprop) | L2 (DBus) | pgrep
|
||||
- Assertion shape: <one sentence>
|
||||
- Skip rules: <which rows + why>
|
||||
- Verification path: <typecheck + run result>
|
||||
- Open questions: <caveats>
|
||||
```
|
||||
|
||||
#### Phase 2 — synthesis
|
||||
|
||||
After fan-out returns:
|
||||
|
||||
1. `cd tools/test-harness && npm run typecheck` — must stay clean.
|
||||
2. Run the new runners against KDE-W (the dev box) — but flag the
|
||||
user first if any are destructive (seedFromHost kills running
|
||||
Claude; T31/T32 require an open Code-tab session that may
|
||||
accumulate state). Capture pass/skip/fail per spec for the
|
||||
matrix.
|
||||
3. Update [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
|
||||
"Status (post-execution)" section to reflect newly-shipped
|
||||
specs and any reclassifications discovered mid-flight.
|
||||
4. Update [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
|
||||
inventory table.
|
||||
5. Write a final report listing:
|
||||
- Specs landed (pass / skip / needs-tuning per row)
|
||||
- Specs deferred (with the per-test rationale)
|
||||
- Specs reclassified (Tier 3 → Tier 2, Tier 2 → blocked, etc.)
|
||||
- Updated coverage stat (was 50/76 = 66%, now N/76 = M%)
|
||||
6. Don't commit. The user reviews and commits.
|
||||
7. Rotate this prompt: rewrite
|
||||
`docs/testing/runner-implementation-followup-prompt.md` for the
|
||||
NEXT session's deferred items.
|
||||
|
||||
### Self-correction loop
|
||||
|
||||
Same as sessions 1 and 2:
|
||||
|
||||
1. Subagent typecheck failure → re-spawn with explicit fix
|
||||
instruction.
|
||||
2. Subagent claims a runner exists but `git status` shows no new
|
||||
file → re-spawn with explicit "use the Write tool" instruction.
|
||||
3. Two subagents wrote runners that share a primitive but with
|
||||
different shapes → factor into `lib/<topic>.ts` BEFORE shipping.
|
||||
|
||||
Cap re-spawns at 2 per file. Past that, mark as needing human
|
||||
review and move on.
|
||||
|
||||
### Termination conditions
|
||||
|
||||
Stop and write the final report when one of:
|
||||
|
||||
1. **All Category A + B + C target specs landed and typecheck-clean.**
|
||||
Write coverage update, stop.
|
||||
2. **Hit re-spawn cap on 3+ runners.** Stop, write up which are
|
||||
blocked.
|
||||
3. **Discovered a primitive gap that breaks 5+ Tier 2/Tier 3
|
||||
tests.** Stop, propose where the new primitive should live in
|
||||
`lib/`. Future session adds the primitive first, then resumes.
|
||||
4. **Session budget hits ~7 new specs.** Stop, synthesize, leave
|
||||
the rest for the next session.
|
||||
|
||||
### What you should NOT do
|
||||
|
||||
- **Don't try to land Category A + B + C in one batch.** That's
|
||||
~9-11 specs. Pick the highest-confidence subset for the first
|
||||
batch and decide whether to do more based on what came back.
|
||||
- **Don't ship stubs.** If a runner can't actually assert what the
|
||||
spec says, mark it as Tier 3 / blocked / primitive-gap and don't
|
||||
write a placeholder. The cumulative six "stop and report"
|
||||
outcomes from sessions 1+2 were the right call — every one
|
||||
revealed a real constraint.
|
||||
- **Don't break existing runners.** H01-H05 are the canaries.
|
||||
- **Don't pre-extract `lib/electron-mocks.ts`.** The
|
||||
`installShowItemInFolderMock` + `installOpenDialogMock` pair
|
||||
doesn't yet justify a new file; if T24 ships as Category E
|
||||
(mock form), THAT's the third — extract then.
|
||||
- **Don't restructure `lib/`** beyond targeted additions.
|
||||
Premature abstractions are wrong abstractions.
|
||||
- **Don't run destructive Tier 3 tests** that write to the user's
|
||||
real claude.ai account (T22 PR write, T27 scheduling, T29
|
||||
worktree creation, T34 OAuth, T36 hooks). Only the *read-only
|
||||
reframes* of those are in scope this session.
|
||||
- **Don't implement the #569 power-inhibit patch in this session.**
|
||||
That's a separate workstream. The S20 spec follows the patch,
|
||||
not the other way around.
|
||||
- **Don't commit.** The user reviews and commits.
|
||||
|
||||
### Final report format
|
||||
|
||||
```markdown
|
||||
## Runner implementation summary (session 3)
|
||||
|
||||
- Category A landed: N / 5
|
||||
- Category B landed: N / 3
|
||||
- Category C landed: N / 3
|
||||
- Reclassified mid-flight: N (with reasons)
|
||||
- Coverage: was 50/76 (66%), now <NEW>/76 (<PCT>%)
|
||||
- Typecheck: clean | <errors>
|
||||
- KDE-W test run: <pass/skip/fail counts>
|
||||
|
||||
## Per-spec breakdown
|
||||
|
||||
| Cat | Test ID | File | Assertion shape | Status |
|
||||
|---|---|---|---|---|
|
||||
| B | T22 | T22_pr_monitoring_handler.spec.ts | seedFromHost + IPC handler probe + asar fingerprint | ✓ pass |
|
||||
| ... |
|
||||
|
||||
## Notable findings
|
||||
- ...
|
||||
|
||||
## Open questions
|
||||
- ...
|
||||
|
||||
## Files touched
|
||||
git status output (only tools/test-harness/src/runners/*.spec.ts +
|
||||
maybe lib/* primitives if extraction was needed; possibly plan-doc /
|
||||
README updates).
|
||||
|
||||
## Diff summary
|
||||
git diff --stat
|
||||
```
|
||||
|
||||
### Operational notes
|
||||
|
||||
- Subagents are launched in parallel via a single message with
|
||||
multiple Agent tool calls. Don't serialise.
|
||||
- Each subagent's Write calls land directly in the working tree.
|
||||
- The grounding probe (`tools/test-harness/grounding-probe.ts`) can
|
||||
help when implementing a runner that asserts runtime API state —
|
||||
capture once with `npm run grounding-probe -- --launch
|
||||
--include-synthetic`, grep the output for the IPC channel /
|
||||
accelerator / API your runner needs to assert against.
|
||||
- For seedFromHost specs, the host MUST have a signed-in Claude
|
||||
Desktop. The primitive throws with a clear message if not.
|
||||
Document the prerequisite in your runner's leading comment if
|
||||
it's the first one to add seedFromHost coverage to a new surface.
|
||||
- For tests that touch the AX tree, `claudeai.ts` page-objects are
|
||||
the right substrate — see `T17_folder_picker.spec.ts` for the
|
||||
end-to-end example. Don't query DOM by CSS selector unless
|
||||
`claudeai.ts` doesn't already cover the surface.
|
||||
- For mock-then-call: see T25 for the canonical pattern. Mock
|
||||
installation is in `lib/claudeai.ts` alongside the dialog-mock;
|
||||
add a sibling export, don't pre-extract a new file.
|
||||
|
||||
Begin with Phase 0. Don't fan out until calibration succeeds.
|
||||
@@ -18,11 +18,60 @@ work begins.
|
||||
|
||||
## Status (post-execution)
|
||||
|
||||
**Shipped this session (25 new specs):** T02, T05, T06, T07, T08, T09,
|
||||
**Shipped session 2 (10 new specs):** T10, T16, T23, T25, T26, T38, S10,
|
||||
S19, S25, S28. Coverage moved from 40/76 (53%) to 50/76 (66%).
|
||||
|
||||
Session 2 reclassifications:
|
||||
|
||||
- **S28** reclassified from **Tier 2 → Tier 1**. The plan called for
|
||||
inspector-eval against `Sbn()` permission classifier with a synthetic
|
||||
error string, but `Sbn` is a closure-local in the bundled main process
|
||||
— not reachable from `globalThis` and no IPC surface exposes it. The
|
||||
shipped runner is a single-regex asar fingerprint that pins the same
|
||||
classifier expression (the `"Permission denied" || "Access is denied"
|
||||
|| "could not lock config file" → "permission-denied"` chain) plus the
|
||||
`Failed to create git worktree:` log line. Same drift signal, no
|
||||
launch needed.
|
||||
- **T38** shipped as a **handler-registered probe**, not a no-throw
|
||||
invocation. Calling `LocalSessions.openInEditor` directly would
|
||||
terminate at `shell.openExternal('vscode://...')` (real editor launch
|
||||
+ side effect on host) and would also be blocked by the channel's
|
||||
origin validation against non-claude.ai senders. Instead, the runner
|
||||
inspects `ipcMain._invokeHandlers` for the channel ending in
|
||||
`LocalSessions_$_openInEditor`. Documents the channel's
|
||||
`$eipc_message$_<UUID>_$_claude.web_$_<name>` framing (UUID is
|
||||
build-stable: `c0eed8c9-...`) so a future framing change is visible.
|
||||
- **T23 tool choice — dbus-monitor, not gdbus monitor.** The plan
|
||||
suggested `gdbus monitor --session --dest=org.freedesktop.Notifications`
|
||||
but `gdbus monitor --dest <name>` only sees signals OWNED BY that
|
||||
destination, not method calls TO it. `Notify` is a method call FROM
|
||||
Electron TO the daemon, so `gdbus` cannot observe it. Switched to
|
||||
`dbus-monitor` (eavesdrop match rule). Pre-launch checks gate on
|
||||
`dbus-monitor` presence + notification-daemon ownership of
|
||||
`org.freedesktop.Notifications`.
|
||||
- **S19 honest-stub note.** The Tier 2 slice is env propagation
|
||||
(`extraEnv: { CLAUDE_CONFIG_DIR }` → main-process `process.env`).
|
||||
Half 1 (env propagation) is the load-bearing assertion. Half 2
|
||||
(resolver-shape echo) is a synthetic re-implementation of `cE()` /
|
||||
`Tce()` because those minified symbols are closure-locals — leading
|
||||
comment is explicit about the re-implementation status.
|
||||
- **T25 / T38 / T23 host side effects documented.** T25 may briefly
|
||||
open a file manager on the host; T38 deliberately doesn't invoke;
|
||||
T23 fires a real notification. Each runner's leading comment
|
||||
documents the side effect.
|
||||
|
||||
Tier 2 → Tier 2 candidates remaining for a future session: **T31, T32**
|
||||
(side chat, slash menu — both need a Code-tab session OPEN, not just
|
||||
the Code tab loaded). **S11, S14** (focus-shifter primitive still
|
||||
unbuilt — flagged in plan and unchanged).
|
||||
|
||||
---
|
||||
|
||||
**Shipped session 1 (25 new specs):** T02, T05, T06, T07, T08, T09,
|
||||
T11, T12, T13, T14a, T14b, S01, S02, S03, S04, S05, S07, S08, S15, S16,
|
||||
S17, S21, S22, S26, S27. Coverage moved from 15/76 (20%) to 40/76 (53%).
|
||||
|
||||
Reclassifications discovered during execution:
|
||||
Session 1 reclassifications:
|
||||
|
||||
- **T05** shipped as a **Tier 3 delivery probe** (xdg-open →
|
||||
`app.on('second-instance', ...)`), not the originally-planned Tier 2
|
||||
|
||||
@@ -1,376 +0,0 @@
|
||||
# test-harness runner implementation — implementation prompt
|
||||
|
||||
This file is meant to be **copied verbatim into a fresh Claude Code
|
||||
session** as the initial user message. Don't paraphrase it; the
|
||||
orchestration depends on the exact directives below.
|
||||
|
||||
---
|
||||
|
||||
## Prompt to paste
|
||||
|
||||
You're picking up after the cases-grounding sweep, the
|
||||
`grounding-probe.ts` runtime probe, and the `CLAUDE_HARNESS_USE_WAYLAND`
|
||||
flag all landed. The case docs are now anchored to upstream code +
|
||||
wrapper scripts and the harness can swap backends with one env var.
|
||||
The next workstream is **wiring runners for the 61 of 76 tests that
|
||||
still don't have one**.
|
||||
|
||||
Today the harness has 15 specs covering 4 of 39 cross-env tests (T01,
|
||||
T03, T04, T17) and 11 of 37 env-specific tests (S09, S12, S29-S37
|
||||
Quick Entry sweep) — plus 5 H-prefix harness self-tests. Everything
|
||||
else is human-execution-only. The smoke set in
|
||||
[`docs/testing/README.md`](docs/testing/README.md#smoke-set) lists T01,
|
||||
T03, T04, T05, T07, T08, T11, T15, T16, T17 as the release gate; only
|
||||
4 of those are wired. The rest of the gate goes through a manual
|
||||
clicker.
|
||||
|
||||
This is too many to land in one session — don't try. Triage first,
|
||||
land the cheap probes, get the single-launch wins, defer the rest
|
||||
with explicit reasons. Optimise for **broad coverage of cheap
|
||||
checks** over deep coverage of one renderer-heavy surface.
|
||||
|
||||
### Authoritative reference
|
||||
|
||||
Read these in order before fanning out:
|
||||
|
||||
- `docs/testing/cases/README.md` — case-doc structure and the four
|
||||
anchor scopes (upstream / wrapper / SPA / CLI). Each test you wire
|
||||
needs to either consume an existing anchor or surface a new one.
|
||||
- `tools/test-harness/README.md` — runner conventions, the 5 distinct
|
||||
shapes of TS code already in `lib/` (xprop / dbus-next / inspector
|
||||
attach / `app.asar` reads / `/proc/$pid/cmdline` reads / pgrep), the
|
||||
isolation defaults, the CDP-gate workaround.
|
||||
- `tools/test-harness/src/lib/` — the existing primitives. Most new
|
||||
runners are recombinations:
|
||||
- `electron.ts` — `launchClaude()`, `app.attachInspector()`,
|
||||
`app.waitForReady('window'|'mainVisible'|'claudeAi'|'userLoaded')`
|
||||
- `inspector.ts` — `evalInMain()`, `evalInRenderer()`,
|
||||
`getAccessibleTree()`, `clickByBackendNodeId()`
|
||||
- `claudeai.ts` — page-objects against the AX-tree substrate
|
||||
- `quickentry.ts` — shared QE primitives (`installInterceptor`,
|
||||
`openAndWaitReady`)
|
||||
- `sni.ts` — DBus StatusNotifierWatcher attribution by pid
|
||||
- `wm.ts` — xprop wrappers; `findX11WindowByPid`,
|
||||
`getNetFrameExtents`
|
||||
- `argv.ts` — `/proc/$pid/cmdline` flag check
|
||||
- `asar.ts` — in-place `app.asar` content reads (no temp extract)
|
||||
- `row.ts` — `skipUnlessRow()` / `skipOnRow()`
|
||||
- `diagnostics.ts` — launcher log + `--doctor` capture
|
||||
- `tools/test-harness/src/runners/` — every existing spec is a
|
||||
template. Match the layer (L1 / L2 / file / argv / pid) to the
|
||||
test's assertion shape. Don't reinvent — `T17_folder_picker.spec.ts`
|
||||
for L1 click chains, `T03_tray_icon_present.spec.ts` for DBus,
|
||||
`T04_window_decorations.spec.ts` for xprop, `S33_electron_version_capture.spec.ts`
|
||||
for asar reads, `S12_global_shortcuts_portal_flag.spec.ts` for argv.
|
||||
- `docs/testing/cases/*.md` — the spec each runner asserts. The
|
||||
**Code anchors:** field tells you exactly where upstream implements
|
||||
the feature.
|
||||
- `CLAUDE.md` (project root) — code style, attribution, commit format.
|
||||
|
||||
### Tests in scope
|
||||
|
||||
The 61 missing runners, by source file:
|
||||
|
||||
| File | Missing |
|
||||
|---|---|
|
||||
| `launch.md` | T02, T13, T14 |
|
||||
| `tray-and-window-chrome.md` | T07, T08, S08, S13 |
|
||||
| `shortcuts-and-input.md` | T05, T06, S06, S07, S10, S11, S14 |
|
||||
| `code-tab-foundations.md` | T15, T16, T18, T19, T20 |
|
||||
| `code-tab-workflow.md` | T21, T22, T29, T30, T31, T32 |
|
||||
| `code-tab-handoff.md` | T23, T24, T25, T34, T38, T39 |
|
||||
| `routines.md` | T26, T27, T28, S19, S20, S21 |
|
||||
| `extensibility.md` | T11, T33, T35, T36, T37, S27, S28 |
|
||||
| `distribution.md` | S01, S02, S03, S04, S05, S15, S16, S26 |
|
||||
| `platform-integration.md` | T09, T10, T12, S17, S18, S22, S23, S24, S25 |
|
||||
|
||||
### Why this iteration
|
||||
|
||||
The matrix today is mostly hand-driven. Each cell update requires a
|
||||
human running through the case-doc steps on a VM and writing a status
|
||||
into `matrix.md`. That's expensive enough that sweeps happen
|
||||
infrequently and regressions sit longer than they should. A wider
|
||||
runner net means more cells get refreshed automatically per release,
|
||||
the human time goes to the renderer-heavy tests that genuinely need
|
||||
it, and the Smoke + Critical gate stops being a manual-clicker
|
||||
bottleneck on every tag.
|
||||
|
||||
This isn't one-session work — it's the start of a series. Optimise
|
||||
for **landing what's tractable in this session and clearly handing
|
||||
off the rest** with concrete next-steps per deferred test.
|
||||
|
||||
### Constraints to respect (don't violate)
|
||||
|
||||
- **Default isolation.** Every `launchClaude()` gets a fresh
|
||||
`XDG_CONFIG_HOME` sandbox, cleaned on `close()`. Tests that need a
|
||||
signed-in claude.ai must opt out via `launchClaude({ isolation: null })`
|
||||
AND gate themselves on `CLAUDE_TEST_USE_HOST_CONFIG=1`. Never write
|
||||
to the user's real config without that gate.
|
||||
- **CDP auth gate is alive.** `_electron.launch()` and
|
||||
`chromium.connectOverCDP()` both inject `--remote-debugging-port`
|
||||
which exits the app. The harness uses `SIGUSR1` runtime-attach via
|
||||
`app.attachInspector()` — same code path as Developer → Enable Main
|
||||
Process Debugger. Don't try to use Playwright's electron launcher.
|
||||
- **BrowserWindow Proxy gotcha.** `frame-fix-wrapper.js` returns the
|
||||
electron module wrapped in a Proxy. `electron.BrowserWindow.getAllWindows()`
|
||||
returns 0. Use `webContents.getAllWebContents()` instead. Constructor-
|
||||
level wraps don't take — use prototype-method hooks (see
|
||||
`docs/learnings/test-harness-electron-hooks.md`).
|
||||
- **`skipUnlessRow()` always first.** First line of every `test()`
|
||||
body. JUnit `<skipped>` → matrix `-`, never `✗` for an
|
||||
inapplicable row.
|
||||
- **No fixed sleeps.** `retryUntil` from `lib/retry.ts`, or
|
||||
Playwright's auto-wait. Fixed `sleep(N)` is a smell.
|
||||
- **Diagnostics on every run** (Decision 7), not just failures.
|
||||
`testInfo.attach()` the launcher log, `--doctor` output, frame
|
||||
extents, click attempts, etc. Captured as JSON dumps for
|
||||
multi-state tests (S31 pattern).
|
||||
- **Tag with annotations.** `severity:` and `surface:` annotations on
|
||||
every test so JUnit carries them through to matrix-regen.
|
||||
- **Tabs in TS, ~80-char wrap as the existing files do.** Match the
|
||||
surrounding style.
|
||||
- **Don't break existing runners.** `npm run typecheck` must stay
|
||||
clean. Run the full `npm test` against KDE-W if you have access;
|
||||
if not, document the verification path in your report.
|
||||
|
||||
### Phases
|
||||
|
||||
#### Phase 0 — calibration
|
||||
|
||||
1. `cd tools/test-harness && npm run typecheck` — should pass; if
|
||||
not, stop and report.
|
||||
2. Read `tools/test-harness/README.md` end-to-end and one full runner
|
||||
(suggest `T04_window_decorations.spec.ts` — small, file probe
|
||||
adjacent, easy to follow). Confirm you understand the spec
|
||||
contract before fanning out.
|
||||
3. Pick T02 (doctor exit code) as a calibration runner. Read
|
||||
`docs/testing/cases/launch.md` T02, find the existing anchors in
|
||||
`scripts/doctor.sh`, sketch a runner in your head: spawn
|
||||
`claude-desktop --doctor`, assert `exit code === 0`, attach the
|
||||
stdout. ~30 lines. Don't write it yet — just confirm you can plan
|
||||
the shape from the spec.
|
||||
|
||||
If Phase 0 surfaces a problem (typecheck failing, primitives unclear,
|
||||
spec contract not understood), stop and report. Don't fan out
|
||||
subagents against an unverified workflow.
|
||||
|
||||
#### Phase 1 — triage
|
||||
|
||||
Spawn ONE subagent (`subagent_type: 'general-purpose'`) that reads
|
||||
every case file in scope and produces a **tiered runner-implementation
|
||||
plan**. Plan output goes to
|
||||
`docs/testing/runner-implementation-plan.md` (new file) with this
|
||||
shape:
|
||||
|
||||
```markdown
|
||||
## Tier 1 — File / spawn / argv probes (~30min-1hr each)
|
||||
No app launch needed (or single short-lived spawn). Existing
|
||||
primitives suffice.
|
||||
- T02 — `claude-desktop --doctor` exit-code probe. Reuse: lib/diagnostics.ts.
|
||||
- T13 — same as T02 but parse the package-format line.
|
||||
- S01 — DEB control file `Depends:` field empty (read post-build).
|
||||
- S02 — distro substring matching in launcher (asar/script grep).
|
||||
- ...
|
||||
|
||||
## Tier 2 — Single-launch probes (~2-3hrs each)
|
||||
One launchClaude() + inspector attach. Existing primitives suffice.
|
||||
- T05 — URL handler `claude://` registered. Verify via xdg-mime + spawn.
|
||||
- T06 — Quick Entry shortcut registered. globalShortcut.isRegistered().
|
||||
- ...
|
||||
|
||||
## Tier 3 — Multi-step or login-required (~4-8hrs each)
|
||||
Need claude.ai signed in (CLAUDE_TEST_USE_HOST_CONFIG=1) or multi-
|
||||
launch state. Defer to follow-up sessions.
|
||||
- T15 — sign-in flow. Needs OAuth provider mock or live login.
|
||||
- T16 — Code tab loads. Needs login.
|
||||
- ...
|
||||
|
||||
## Tier 4 — Out of scope or blocked
|
||||
- T39 — /desktop is the CLI `claude` binary, not the Electron asar.
|
||||
- S26 — gated on #567 (autoUpdater no-op patch).
|
||||
- T11, T33 — plugin install needs real plugin + network.
|
||||
- ...
|
||||
```
|
||||
|
||||
The triage subagent's job is **only** to classify each test into a
|
||||
tier and record reasoning. No runner code yet.
|
||||
|
||||
#### Phase 2 — Tier 1 fan-out
|
||||
|
||||
Once the plan is in `docs/testing/runner-implementation-plan.md`, spawn
|
||||
**one subagent per Tier 1 test** in parallel. Per-subagent prompt:
|
||||
|
||||
```
|
||||
You're implementing ONE test-harness runner for <TEST-ID> in
|
||||
docs/testing/cases/<FILE>.md.
|
||||
|
||||
Read in order:
|
||||
- docs/testing/cases/<FILE>.md (the spec — focus on the <TEST-ID>
|
||||
section, including its Code anchors)
|
||||
- tools/test-harness/README.md (conventions)
|
||||
- tools/test-harness/src/runners/<closest-existing-template>.spec.ts
|
||||
- CLAUDE.md (project conventions)
|
||||
|
||||
Write tools/test-harness/src/runners/<TEST-ID>_short_name.spec.ts.
|
||||
Match the closest template. First line of the test body:
|
||||
skipUnlessRow(testInfo, ['<rows-this-applies-to>']) per the spec's
|
||||
Applies to field.
|
||||
|
||||
Constraints:
|
||||
- Tabs, ~80-char wrap.
|
||||
- Use lib/* primitives; don't reinvent.
|
||||
- testInfo.attach() the diagnostics from the spec's Diagnostics on
|
||||
failure block.
|
||||
- Tag with severity + surface annotations.
|
||||
- No fixed sleeps. retryUntil or Playwright auto-wait.
|
||||
- npm run typecheck must stay clean after your edits.
|
||||
- Don't commit. The user reviews and commits.
|
||||
|
||||
If the test isn't reasonable to implement (the case anchors don't
|
||||
resolve to anything assertable, the test depends on state you can't
|
||||
construct, the existing primitives don't cover the surface), DO NOT
|
||||
write a stub. Report under Open questions and stop.
|
||||
|
||||
Report shape (~150 words):
|
||||
## <TEST-ID> runner
|
||||
|
||||
- File written: tools/test-harness/src/runners/<filename>.spec.ts
|
||||
- Layer: file probe | argv probe | L1 | L2 (xprop) | L2 (DBus) | pgrep
|
||||
- Assertion shape: <what the test asserts in one sentence>
|
||||
- Skip rules: <which rows are skipped + why>
|
||||
- Verification path: <how the user verifies this runs cleanly>
|
||||
- Open questions: <any caveats>
|
||||
```
|
||||
|
||||
Send Tier 1 subagents in parallel — they're independent, each owns
|
||||
one runner file. Cap at ~10 subagents at a time to keep the
|
||||
fan-out manageable.
|
||||
|
||||
#### Phase 3 — Tier 2 fan-out
|
||||
|
||||
Same shape as Phase 2 but for Tier 2 tests. Single-launch probes are
|
||||
heavier (each subagent does a full launchClaude() + assertion design)
|
||||
so cap at ~6 subagents in parallel.
|
||||
|
||||
If a Tier 2 subagent hits a blocker that pushes the test into Tier 3
|
||||
mid-implementation, stop the subagent — don't ship a stub. The
|
||||
synthesis step will reclassify.
|
||||
|
||||
#### Phase 4 — synthesis
|
||||
|
||||
Once Tier 1 + Tier 2 land:
|
||||
|
||||
1. `cd tools/test-harness && npm run typecheck` — must be clean.
|
||||
2. Run the new runners against KDE-W if you have access:
|
||||
`ROW=KDE-W npx playwright test src/runners/T02_*.spec.ts ...` —
|
||||
capture which pass cleanly and which need selector tuning.
|
||||
3. Write a final report at the end of the session listing:
|
||||
- Runners landed (pass / skip / needs-tuning per row)
|
||||
- Tier 3 deferred (with the per-test rationale from the plan)
|
||||
- Tier 4 out-of-scope (with reasons)
|
||||
- Updated coverage stat (was 15/76 = 20%, now N/76 = M%)
|
||||
4. Don't commit. The user reviews and commits.
|
||||
|
||||
### Self-correction loop
|
||||
|
||||
After Phase 2 / Phase 3 returns:
|
||||
|
||||
1. If a subagent's runner fails typecheck, re-spawn with explicit
|
||||
instruction to read the typecheck error and fix it (often a missing
|
||||
import or stale type from a renamed primitive).
|
||||
2. If a subagent claimed a runner exists but `git status` shows no
|
||||
new file, the subagent silently dropped the write — re-spawn with
|
||||
explicit "use the Write tool" instruction.
|
||||
3. If two subagents wrote runners that share a primitive but with
|
||||
slightly different shapes, refactor the duplication into
|
||||
`lib/<topic>.ts` BEFORE shipping — duplication compounds across the
|
||||
next 50 runners.
|
||||
|
||||
Cap re-spawns at 2 per file. Past that, mark the runner as needing
|
||||
human review in the final report and move on.
|
||||
|
||||
### Termination conditions
|
||||
|
||||
Stop and write the final report when one of:
|
||||
|
||||
1. **Tier 1 + Tier 2 all landed and typecheck-clean.** Write the
|
||||
coverage update, stop. Future sessions handle Tier 3.
|
||||
2. **Hit re-spawn cap on 3+ runners.** Stop, write up which runners
|
||||
are blocked and what each blocker looks like.
|
||||
3. **Discovered a primitive gap that breaks 5+ Tier 2 tests.** Stop,
|
||||
write up the gap, propose where the new primitive should live in
|
||||
`lib/`. Future session adds the primitive first, then resumes
|
||||
Tier 2.
|
||||
|
||||
### What you should NOT do
|
||||
|
||||
- **Don't try to land Tier 3 / Tier 4 in this session.** They're
|
||||
deferred for documented reasons. If you find one that turns out
|
||||
cheaper than the plan estimated, note it and pull it forward to
|
||||
Tier 2 — but don't open new can of worms when there are still
|
||||
Tier 1 wins on the table.
|
||||
- **Don't ship stubs.** If a runner can't actually assert what the
|
||||
spec says, mark it as Tier 3 in the plan and don't write a placeholder.
|
||||
- **Don't break existing runners.** `H01-H05` are the canaries.
|
||||
`npm test` must still pass H01-H05 after every Tier 1/2 commit.
|
||||
- **Don't restructure lib/.** Add primitives only when 2+ runners
|
||||
need them. Premature shared abstractions will be wrong.
|
||||
- **Don't run the host Claude Desktop in destructive ways.** Tests
|
||||
that need login should gate on `CLAUDE_TEST_USE_HOST_CONFIG=1`.
|
||||
- **Don't commit.** The user reviews and commits.
|
||||
|
||||
### Final report format
|
||||
|
||||
```markdown
|
||||
## Runner implementation summary
|
||||
|
||||
- Tier 1 landed: N / M
|
||||
- Tier 2 landed: N / M
|
||||
- Tier 3 deferred: N (see plan)
|
||||
- Tier 4 out-of-scope: N (see plan)
|
||||
- Coverage: was 15/76 (20%), now <NEW>/76 (<PCT>%)
|
||||
- Typecheck: clean | <errors>
|
||||
- KDE-W test run: <pass/skip/fail counts>
|
||||
|
||||
## Per-tier breakdown
|
||||
|
||||
| Tier | Test ID | File | Assertion shape | Status |
|
||||
|---|---|---|---|---|
|
||||
| 1 | T02 | T02_doctor_exit_code.spec.ts | spawn + exit code | ✓ pass |
|
||||
| 1 | S01 | S01_deb_control_no_depends.spec.ts | file probe | ✓ pass |
|
||||
| 2 | T05 | T05_url_handler_claude_scheme.spec.ts | xdg-mime + spawn | ✓ pass |
|
||||
| 2 | T06 | T06_quick_entry_shortcut.spec.ts | globalShortcut | ⏳ needs ydotool |
|
||||
| ...
|
||||
|
||||
## Notable findings
|
||||
- ...
|
||||
|
||||
## Open questions
|
||||
- ...
|
||||
|
||||
## Files touched
|
||||
git status output (only tools/test-harness/src/runners/*.spec.ts and
|
||||
docs/testing/runner-implementation-plan.md should appear; possibly
|
||||
new lib/ primitives if extraction was needed).
|
||||
|
||||
## Diff summary
|
||||
git diff --stat
|
||||
```
|
||||
|
||||
### Operational notes
|
||||
|
||||
- Subagents are launched in parallel via a single message with
|
||||
multiple Agent tool calls. Don't serialise.
|
||||
- Each subagent's Write calls land directly in the working tree.
|
||||
Each owns one runner file — no merge conflicts.
|
||||
- The grounding probe (`tools/test-harness/grounding-probe.ts`) can
|
||||
help when implementing a runner that asserts runtime API state —
|
||||
capture once with `npm run grounding-probe -- --launch
|
||||
--include-synthetic`, grep the output for the IPC channel /
|
||||
accelerator / API your runner needs to assert against.
|
||||
- For tests that touch the AX tree, `claudeai.ts` page-objects are
|
||||
the right substrate — see `T17_folder_picker.spec.ts` for an end-
|
||||
to-end example. Don't query DOM by CSS selector unless `claudeai.ts`
|
||||
doesn't already cover the surface.
|
||||
|
||||
Begin with Phase 0. Don't fan out until calibration succeeds.
|
||||
@@ -7,7 +7,7 @@ architecture, decisions, and rationale.
|
||||
|
||||
## Status
|
||||
|
||||
Forty specs wired (8 cross-env T-tests, 27 env-specific S-tests, 5
|
||||
Fifty specs wired (14 cross-env T-tests, 31 env-specific S-tests, 5
|
||||
H-prefix harness self-tests). See
|
||||
[`docs/testing/runner-implementation-plan.md`](../../docs/testing/runner-implementation-plan.md)
|
||||
for the tiered triage of remaining tests and the per-spec rationale
|
||||
@@ -24,12 +24,18 @@ behind tier classification.
|
||||
| [T07](../../docs/testing/cases/tray-and-window-chrome.md#t07--in-app-topbar) | Five topbar buttons render with non-zero rects (uses `seedFromHost` for hermetic auth) | L1 + DOM |
|
||||
| [T08](../../docs/testing/cases/tray-and-window-chrome.md#t08--close-x-hides-to-tray) | `win.close()` fires the wrapper interceptor; window hidden, proc alive | L1 |
|
||||
| [T09](../../docs/testing/cases/platform-integration.md#t09--autostart-via-xdg) | `setLoginItemSettings({ openAtLogin })` writes/removes `$XDG_CONFIG_HOME/autostart/claude-desktop.desktop` | L1 + filesystem |
|
||||
| [T10](../../docs/testing/cases/platform-integration.md#t10--cowork-integration) | After H04-style spawn detection, `kill -9` the daemon and confirm a *different* pid respawns within ~20s (Patch 6 cooldown + retry) | pgrep delta + spawn delta |
|
||||
| [T11](../../docs/testing/cases/extensibility.md#t11--plugin-install) | Plugin-install code path fingerprints present in bundled `index.js` | file probe |
|
||||
| [T12](../../docs/testing/cases/platform-integration.md#t12--webgl-warn-only) | `app.getGPUFeatureStatus()` returns a populated object; renderer reached visible | L1 |
|
||||
| [T13](../../docs/testing/cases/launch.md#t13--doctor-reports-correct-package-format) | `--doctor` does not false-flag rpm/deb installs as missing-dpkg AppImage | spawn + stdout grep |
|
||||
| [T14a](../../docs/testing/cases/launch.md#t14--multi-instance-behavior) | `requestSingleInstanceLock` + `'second-instance'` strings in bundled `index.js` (file probe) | file probe |
|
||||
| [T14b](../../docs/testing/cases/launch.md#t14--multi-instance-behavior) | Second invocation under same isolation exits cleanly; primary pid stays alive (runtime probe) | spawn delta + pgrep |
|
||||
| [T16](../../docs/testing/cases/code-tab-foundations.md#t16--code-tab-loads) | After `seedFromHost` + `userLoaded`, `CodeTab.activate()` resolves and ≥1 compact pill renders (env pill = Code-body mounted) | L1 + AX-tree |
|
||||
| [T17](../../docs/testing/cases/code-tab-foundations.md#t17--folder-picker-opens) | Code df-pill → env pill → Local → Select folder → Open folder triggers `dialog.showOpenDialog` (requires `CLAUDE_TEST_USE_HOST_CONFIG=1`) | L1 |
|
||||
| [T23](../../docs/testing/cases/code-tab-handoff.md#t23--desktop-notifications-fire) | Firing `new Notification({title})` from main reaches the session bus's `org.freedesktop.Notifications.Notify` (observed via `dbus-monitor`) | L1 + DBus subprocess |
|
||||
| [T25](../../docs/testing/cases/code-tab-handoff.md#t25--show-in-files--file-manager) | After `installShowItemInFolderMock` mirroring T17's dialog-mock pattern, `evalInMain` calls `shell.showItemInFolder(<synthetic path>)`; mock records the call verbatim, no throw — no host side effect | L1 (mocked egress) |
|
||||
| [T26](../../docs/testing/cases/routines.md#t26--routines-page-renders) | After `seedFromHost` + `userLoaded`, click "Routines" sidebar AX button; assert "New routine" / "All" / "Calendar" anchor renders | L1 + AX-tree |
|
||||
| [T38](../../docs/testing/cases/code-tab-handoff.md#t38--continue-in-ide) | `ipcMain._invokeHandlers` registry contains a channel ending in `LocalSessions_$_openInEditor` (handler-registered probe) | L1 (IPC introspection) |
|
||||
| H01 | CDP auth gate exits with code 1 when spawned with `--remote-debugging-port` and no `CLAUDE_CDP_AUTH` token | spawn probe |
|
||||
| H02 | `frame-fix-wrapper.js` + `frame-fix-entry.js` injected into `app.asar` (Proxy + main-field reference) | file probe |
|
||||
| H03 | Build-pipeline patch fingerprints all present in `app.asar` (KDE gate, frame-fix inject, tray, cowork, claude-code) | file probe |
|
||||
@@ -43,14 +49,18 @@ behind tier classification.
|
||||
| [S07](../../docs/testing/cases/shortcuts-and-input.md#s07--claude_use_waylandvar) | Under `CLAUDE_HARNESS_USE_WAYLAND=1`, spawned Electron has `--ozone-platform=wayland` on argv | argv probe |
|
||||
| [S08](../../docs/testing/cases/tray-and-window-chrome.md#s08--tray-icon-doesnt-duplicate-after-nativetheme-update) | `setImage`-based in-place fast-path injected by `tray.sh` (KDE-only, file probe) | file probe |
|
||||
| [S09](../../docs/testing/cases/shortcuts-and-input.md#s09--quick-window-patch-runs-only-on-kde-post-406-gate) | KDE-gate string present in bundled `index.js` (patch ran at build) | file probe |
|
||||
| [S10](../../docs/testing/cases/shortcuts-and-input.md#s10--quick-entry-popup-is-transparent-no-opaque-square-frame) | KDE-W only — popup runtime `getBackgroundColor() === '#00000000'` after Quick Entry opens (regression-detector against electron#50213 if bundled Electron in 41.0.4-bisect-window) | L1 + ydotool |
|
||||
| S12 | `--enable-features=GlobalShortcutsPortal` in Electron argv (GNOME-W only — currently a known-failing regression detector) | argv probe |
|
||||
| [S15](../../docs/testing/cases/distribution.md#s15--appimage-extraction---appimage-extract-works-as-documented-fallback) | `--appimage-extract` exits 0; `squashfs-root/AppRun --version` runs without FUSE error | spawn + filesystem |
|
||||
| [S16](../../docs/testing/cases/distribution.md#s16--appimage-mount-cleans-up-on-app-exit) | `mount(8)` shows new `.mount_claude` while app is up; gone within 10s of close | mount delta |
|
||||
| [S17](../../docs/testing/cases/platform-integration.md#s17--app-launched-from-desktop-inherits-shell-path) | Shell-path-worker overlays user's login-shell PATH onto a deliberately-scrubbed env | L1 + utilityProcess |
|
||||
| [S19](../../docs/testing/cases/routines.md#s19--claude_config_dir-redirects-scheduled-task-storage) | `extraEnv: { CLAUDE_CONFIG_DIR }` reaches main-process `process.env`; `cE()`-equivalent resolves under the override path | L1 + extraEnv |
|
||||
| [S21](../../docs/testing/cases/routines.md#s21--lid-close-still-suspends-per-os-policy) | No `handle-lid-switch` / `HandleLidSwitch` strings in bundle (lid policy deferred to OS) | asar absence probe |
|
||||
| [S22](../../docs/testing/cases/platform-integration.md#s22--computer-use-toggle-absent-or-visibly-disabled-on-linux) | `new Set(["darwin","win32"])` platform gate present; no 2-element Set pairing linux (file-probe form) | asar regex |
|
||||
| [S25](../../docs/testing/cases/platform-integration.md#s25--mobile-pairing-survives-linux-session-restart) | `safeStorage.encryptString → file → app restart → file → safeStorage.decryptString` round-trips the same plaintext (skips when `isEncryptionAvailable === false`) | L1 + shared isolation handle |
|
||||
| [S26](../../docs/testing/cases/distribution.md#s26--auto-update-is-disabled-when-installed-via-aptdnf) | `setFeedURL` present + project suppression marker present (currently fails — gated on #567) | asar fingerprint |
|
||||
| [S27](../../docs/testing/cases/extensibility.md#s27--plugins-install-per-user) | `installed_plugins.json` + homedir resolver present; no `*/plugins` system paths in bundle | asar fingerprint |
|
||||
| [S28](../../docs/testing/cases/extensibility.md#s28--worktree-creation-surfaces-clear-error-on-read-only-mounts) | Bundled `index.js` contains the worktree permission classifier expression (`"Permission denied" \|\| "Access is denied" \|\| "could not lock config file" → "permission-denied"`) plus the `Failed to create git worktree:` log line | asar fingerprint |
|
||||
| [S29](../../docs/testing/cases/shortcuts-and-input.md#s29--quick-entry-popup-is-created-lazily-on-first-shortcut-press-closed-to-tray-sanity) | Popup opens when main is hidden-to-tray (lazy-create sanity) | L1 |
|
||||
| [S30](../../docs/testing/cases/shortcuts-and-input.md#s30--quick-entry-shortcut-becomes-a-no-op-after-full-app-exit) | No new claude-desktop pid spawns after post-exit shortcut press | pgrep delta + ydotool |
|
||||
| [S31](../../docs/testing/cases/shortcuts-and-input.md#s31--quick-entry-submit-makes-the-new-chat-reachable-from-any-main-window-state) | Submit reaches new chat from visible / minimized / hidden-to-tray (QE-7/8/9) | L1 + ydotool |
|
||||
@@ -62,15 +72,19 @@ behind tier classification.
|
||||
| S37 | Main-window destroy unreachable on Linux per close-to-tray override — documented skip | — |
|
||||
|
||||
These specs exercise the substrate primitives in `lib/`: `xprop`
|
||||
shell-outs (T01, T04), `dbus-next` (T03), Node-inspector runtime-attach
|
||||
(T07/T17/S29-S35/T05-T14b L1 specs), `app.asar` content reads
|
||||
(S08/S09/S21/S22/S26/S27/T11/T14a/H02/H03/S33), `/proc/$pid/cmdline`
|
||||
reads (S07/S12), pgrep-based pid deltas (T14b/H04/S16/S30), `mount(8)`
|
||||
parsing (S16), source-tree probes against `scripts/launcher-common.sh`
|
||||
(S02), `dpkg-query` / `rpm -qR` / `rpm -qf` calls (S03/S04/S05/T13),
|
||||
and the new `createIsolation({ seedFromHost: true })` primitive that
|
||||
lets login-required tests run hermetically against a copy of the
|
||||
host's signed-in auth state (T07).
|
||||
shell-outs (T01, T04), `dbus-next` (T03), `dbus-monitor` subprocess
|
||||
eavesdrop (T23), Node-inspector runtime-attach
|
||||
(T07/T16/T17/T26/S10/S29-S35/T05-T14b L1 specs), `app.asar` content reads
|
||||
(S08/S09/S21/S22/S26/S27/S28/T11/T14a/H02/H03/S33), `/proc/$pid/cmdline`
|
||||
reads (S07/S12), pgrep-based pid deltas (T10/T14b/H04/S16/S30),
|
||||
`mount(8)` parsing (S16), source-tree probes against
|
||||
`scripts/launcher-common.sh` (S02), `dpkg-query` / `rpm -qR` / `rpm -qf`
|
||||
calls (S03/S04/S05/T13), `safeStorage.encryptString` round-trip across
|
||||
two launches (S25), `extraEnv` precedence over isolation env (S19),
|
||||
`ipcMain._invokeHandlers` registry introspection (T38), and the
|
||||
`createIsolation({ seedFromHost: true })` primitive that lets
|
||||
login-required tests run hermetically against a copy of the host's
|
||||
signed-in auth state (T07, T16, T26).
|
||||
|
||||
Per-row pass/skip counts depend on which sweep runs against the row;
|
||||
see `runner-implementation-plan.md` for tier classification and
|
||||
|
||||
Reference in New Issue
Block a user