docs(testing): session 2 plan/inventory + rotate session 3 prompt

- runner-implementation-plan.md: new "Status (post-execution)" sub-
  section for session 2 listing the 10 new specs and the four
  reclassification notes (S28 → Tier 1, T38 framing, T23 tool choice,
  S19 honest-stub note). Session 1 sub-section preserved verbatim
  below for comparison.
- README.md: 50-spec inventory (was 40), new T-rows (T10, T16, T23,
  T25, T26, T38) and S-rows (S10, S19, S25, S28) interleaved into
  the existing tables. Substrate-primitives paragraph extended with
  dbus-monitor, mock-then-call, ipcMain registry introspection,
  safeStorage round-trip, extraEnv precedence.
- runner-implementation-followup-prompt.md: rewritten for session 3
  — deferred items (T31, T32, S06, S11, S14), Tier 3 → Tier 2
  reframes (T22, T35, T37), asar fingerprint cleanups (T24, T30,
  T33), the focus-shifter primitive build, and the mock-then-call
  extension for T24 as an alternative to its asar form. Includes
  the "known mechanism-recipe table" cumulating sessions 1+2.
- runner-implementation-prompt.md: deleted (session 1's prompt,
  superseded by the followup that's been the rolling document
  since session 1 ended).

Co-Authored-By: Claude <claude@anthropic.com>
This commit is contained in:
aaddrick
2026-05-03 17:01:55 -04:00
parent fb5189fe45
commit 86385848d0
4 changed files with 596 additions and 388 deletions

View File

@@ -0,0 +1,521 @@
# test-harness runner implementation — session 3 prompt
This file is meant to be **copied verbatim into a fresh Claude Code
session** as the initial user message. Don't paraphrase it; the
orchestration depends on the exact directives below.
You're picking up after a runner-implementation session that landed 10
new specs (5 Tier 2 + 4 Tier 2-reframes + 1 Tier 1 reclass), lifting
harness coverage from 40/76 (53%) to 50/76 (66%). Three commits on
`docs/compat-matrix`:
- `XXX``test(harness): session 2 runners + lib/claudeai mock helper`
(10 new spec files; `installShowItemInFolderMock` added to
`lib/claudeai.ts` mirroring the `installOpenDialogMock` pattern;
README inventory + plan-doc status section updated).
(Substitute the actual SHA after committing — the user reviews and
commits at the end of every session.)
The plan doc at
[`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
captures the tier classification and execution-time reclassifications.
Its "Status (post-execution)" section is the source of truth for what's
done and what's deferred — read **session 2** then **session 1**
sub-sections.
This session is a continuation, not a restart. Start by reading the
plan doc's status sections.
### Big new findings from session 2
1. **Mock-then-call beats invoke-then-cleanup for Tier 2 reframes of
side-effecting Electron APIs.** T17's existing
`installOpenDialogMock` pattern was extended to T25 via a new
`installShowItemInFolderMock` in `lib/claudeai.ts`. Net: no host
file-manager pop-up during the run, AND the assertion strengthens
from "didn't throw" to "the egress was reached + the path arg
flowed through verbatim". Apply this pattern to any future
`shell.*` / `dialog.*` Tier 2 reframes (T24 `shell.openExternal`
would mock cleanly the same way).
2. **`gdbus monitor --dest <name>` only sees signals OWNED BY that
destination, not method calls TO it.** T23 had to switch from the
plan's gdbus suggestion to `dbus-monitor` (eavesdrop match rule)
to observe `org.freedesktop.Notifications.Notify` calls from
Electron. If T27 / T22 / S24 ever ship Tier 2 reframes that need
to observe method calls on a service, use `dbus-monitor`.
3. **`ipcMain._invokeHandlers` channel naming carries a build-stable
UUID prefix:** `$eipc_message$_<UUID>_$_claude.web_$_<name>`. T38
anchors on the `_$_<name>` suffix to survive UUID rotation; the
prefix is captured as diagnostic. Useful precedent for any future
IPC-introspection probes.
4. **Closure-local minified helpers are NOT reachable from
globalThis.** S28's plan called for inspector-eval against
`Sbn()`, but `Sbn` is a closure-local — couldn't be invoked. S28
reclassified to Tier 1 (asar fingerprint of the classifier
expression). For any future "Tier 2 reframe via inspector-eval
against minified helper X" entry: confirm reachability before
classifying.
5. **`safeStorage` on Linux uses random IVs.** Only decrypted
plaintext is comparable across encrypt calls; ciphertext bytes
are not deterministic. S25 compares plaintexts.
6. **`extraEnv` precedence.** `lib/electron.ts:317-323` spreads in
order: `process.env`, `LAUNCHER_INJECTED_ENV`, `isolation?.env`,
`waylandEnv`, then `opts.extraEnv`, then `CI: '1'`. Override
wins. S19 leans on this; load-bearing for future tests that
need to override isolation defaults.
### Authoritative reference
Read these in order before fanning out:
- [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
— tier classification + status section. Read both **session 2**
and **session 1** "Status (post-execution)" sub-sections. The
Tier-3 list (line ~342) is the candidate pool for further
reframes.
- [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
— runner conventions, the now-50-spec inventory, primitives in
`lib/`, isolation defaults, the CDP-gate workaround, the
`seedFromHost` reference.
- [`docs/testing/cases/README.md`](cases/README.md) — case-doc
structure and the four anchor scopes.
- [`tools/test-harness/src/lib/`](../../tools/test-harness/src/lib/)
— the existing primitives. Notable additions since session 2:
- `claudeai.ts``installShowItemInFolderMock` /
`getShowItemInFolderCalls` (mirrors `installOpenDialogMock`).
If 3+ tests start using mock-then-call, consider extracting to
`lib/electron-mocks.ts` — but don't pre-extract.
- [`tools/test-harness/src/runners/`](../../tools/test-harness/src/runners/)
— every existing spec is a template. Notable session 2 templates:
- `T16_code_tab_loads.spec.ts` / `T26_routines_page_renders.spec.ts`
— seedFromHost + post-login renderer-side AX nav. T16 uses the
existing `CodeTab.activate()`; T26 inlines a similar AX walker
for the sidebar. Pattern for any further "click an AX-tree
button after login" test.
- `T25_show_item_in_folder_no_throw.spec.ts` — mock-then-call
pattern (mirrors T17). Use this shape for any future Tier 2
reframe of a side-effecting Electron API.
- `T38_open_in_editor_handler_registered.spec.ts` — IPC handler
registry introspection via `ipcMain._invokeHandlers`. Pattern
for any "is this handler wired up" check.
- `T23_notification_reaches_dbus.spec.ts` — dbus-monitor
subprocess + inspector-fired notification + buffer scan.
- `T10_cowork_daemon_respawn.spec.ts` — H04 extension: spawn,
SIGKILL, poll for new pid. Pattern for any "service auto-respawn
contract" test.
- `S25_safestorage_token_persists.spec.ts` — two-launch with
shared isolation handle + safeStorage round-trip via tmpfile.
- `S28_worktree_permission_classifier.spec.ts` — single-regex
asar fingerprint for a multi-string-OR classifier expression.
- [`docs/testing/cases/*.md`](cases/) — the spec each runner
asserts. The **Code anchors:** field tells you exactly where
upstream implements the feature.
### Tests in scope this session
Five categories, in priority order:
| # | Tests | Source files | Notes |
|---|---|---|---|
| **A** Deferred from session 2 | T31, T32, S06, S11, S14 | `code-tab-workflow.md` (T31, T32), `shortcuts-and-input.md` (S06, S11, S14) | T31/T32 need a Code-tab session OPEN; S06 needs Wayland row; S11/S14 need a new focus-shifter primitive |
| **B** Tier 3 → Tier 2 reframes (read-only) | T22, T35, T37 | `code-tab-workflow.md` (T22), `extensibility.md` (T35, T37) | Each can ship as a *reads-from-disk-or-IPC-registry* probe without writing to the user's account |
| **C** Asar fingerprint cleanups | T24, T30, T33 | `code-tab-handoff.md` (T24), `code-tab-workflow.md` (T30), `extensibility.md` (T33) | Each has a load-bearing string set in `index.js` that pins the wiring without needing a launch |
| **D** New primitive — focus-shifter | (unblocks A's S11/S14) | `lib/input.ts` | xdotool / ydotool focus-stealing helper. Build only if S11/S14 worth shipping this session |
| **E** Mock-then-call extension | T24 (mock form) | `code-tab-handoff.md` (T24) | Mirror of T25's pattern but for `shell.openExternal` — handler reaches the egress with the right URL |
Realistic ceiling: **~6-8 new specs** this session. Don't try all
13 — Categories A and B are heavier than session 2's mix because
they need either a Code-tab session opened OR a new primitive built
first.
### Detailed scope per category
#### Category A — deferred items (5)
- **T31 — Side chat opens.** Needs: `seedFromHost` + Code-tab
session OPEN (env pill → Local → choose folder → wait for session
load). After session loads, send `Ctrl+;` via ydotool OR find the
IPC handler `startSideChat` in `ipcMain._invokeHandlers` (T38
pattern) and assert it's registered + invokable. The lighter form
is the IPC-registry probe; the heavier form is full
click-chain-into-side-chat.
- **T32 — Slash command menu.** Needs: same Code-tab session OPEN
preamble as T31. Then trigger `/` in the prompt textarea and
assert the slash menu renders (AX-tree query for menuitem* nodes
in the prompt area). Heavier than T31 because the slash menu is
rendered server-side by claude.ai's bundle.
- **S06 — URL handler segfault on native Wayland.** Needs:
`CLAUDE_HARNESS_USE_WAYLAND=1` row + `coredumpctl info
claude-desktop` observation after firing
`xdg-open 'claude://chat/new'`. Skip cleanly if not on a
Wayland row.
- **S11 / S14 — focus-shifter delivery.** Needs: `lib/input.ts`
with `focusOtherWindow()` (xdotool on X11; skip on Wayland or
use compositor-specific). Then S11 / S14 launch app, focus
another window, fire shortcut, assert popup appears. Build the
primitive in one PR (Category D), then both runners in a second
PR.
#### Category B — Tier 3 → Tier 2 reframes (3)
These each have a slice that doesn't write to the user's real
account:
- **T22 — PR monitoring (read-only half).** The Tier 3 form opens a
PR; the Tier 2 reframe is "after `seedFromHost`, IPC handler
`getPrChecks` is registered + the `gh CLI not found in PATH`
string is in the bundle". The handler-registered probe is the
shippable form; the missing-`gh` warning string is a static
fingerprint. Both ship as one runner.
- **T35 — MCP server config picked up.** Reframe: place a fixture
`claude_desktop_config.json` under the isolation's configDir
(no host config touch needed — fresh isolation), then via
inspector eval, read whatever main-process state holds the
parsed MCP server list. Anchor on a known path under
`${configDir}/Claude/`.
- **T37 — `CLAUDE.md` memory loads.** Reframe: place a fixture
`~/.claude/CLAUDE.md` (or under `CLAUDE_CONFIG_DIR/CLAUDE.md`
with extraEnv override — see S19's pattern), then via inspector
eval read the loaded memory state. Anchor needs to come from
case-doc Code anchors.
#### Category C — asar fingerprint cleanups (3)
Each is Tier 1 / no launch:
- **T24 — Open in external editor (asar fingerprint).** The full
click-chain T24 is Tier 3. The fingerprint half: assert `Mtt`
registry is in `index.js` with the editor scheme strings
(`vscode://`, `cursor://`, `zed://`, `windsurf://`).
- **T30 — Auto-archive on PR merge (cadence constants).** Static
fingerprint of the sweep cadence — assert `300_000` (5 min)
and `3_600_000` (1 h) appear near the auto-archive code.
- **T33 — Plugin browser (IPC handler registered).** Same shape
as T38 — assert `listMarketplaces` IPC handler is registered.
#### Category D — primitive build
- **`lib/input.ts:focusOtherWindow()`.** xdotool on X11
(`xdotool search --name '<test-marker>' windowfocus`); on
Wayland skip cleanly (no portable focus injection). Used by
S11 / S14. Don't build unless those are in scope this session.
#### Category E — mock-then-call extension
- **T24 (mock form, alternative to Category C).** Mock
`shell.openExternal` via a new `installOpenExternalMock` in
`lib/claudeai.ts` (mirror of `installShowItemInFolderMock`),
then `inspector.evalInMain` calls
`shell.openExternal('vscode://file/tmp/test')` and assert
the recorded call list contains the URL. Strictly stronger
than the Category C fingerprint form. **Pick C OR E for T24
— not both.**
### Why this iteration
The harness is at 50/76 coverage and every release tag now
exercises the smoke-set + a chunk of critical surfaces
automatically. Remaining work clusters in three pockets:
- **Code-tab cluster (T15-T39, mostly login-walled).** Session 2
unblocked the *render-only* half via `seedFromHost`; session 3
should push into the *open-a-session* half (T31, T32) and the
read-only-Tier-3-reframes (T22, T35, T37).
- **Wayland-specific tests (S06).** Need a Wayland row + the
harness's `CLAUDE_HARNESS_USE_WAYLAND=1` switch.
- **Focus-shift-dependent tests (S11, S14).** Need
`lib/input.ts:focusOtherWindow()` built first.
After this session, future sessions can focus on the genuinely
heavy Tier 3 work (destructive-write login tests; multi-launch
state) with a clearer cost model.
### Known mechanism-recipe table (session 1 + session 2)
| Pattern | Use when | Worked example |
|---|---|---|
| `createIsolation({ seedFromHost: true })` | spec needs a signed-in renderer; read-only | T07, T16, T26 |
| `isolation: null` + pre-launch `killHostClaude()` | spec needs SingletonLock collision (delivery probes) | T05 |
| Default isolation | most other tests | T01, T03, T04, S29 |
| `isolation: <handle>` (shared across launches) | multi-launch persistent state | S35, S25 |
| `MainWindow.setState('close')` | exercise the wrapper close-interceptor | T08 |
| Mock-then-call for `shell.*` / `dialog.*` | Tier 2 reframe of side-effecting API | T17, T25 |
| `ipcMain._invokeHandlers` registry probe | "is this IPC handler wired up" | T38 |
| `dbus-monitor` subprocess | observe DBus method calls TO a destination | T23 |
| Asar single-regex multi-string-OR fingerprint | classifier-style code that combines several strings | S28 |
### Constraints to respect (don't violate)
These are unchanged from sessions 1 and 2 and still load-bearing:
- **Default isolation** unless the spec needs otherwise. Never
write to `~/.config/Claude` without explicit gating
(`CLAUDE_TEST_USE_HOST_CONFIG=1` opt-out, OR `seedFromHost: true`
with read-only-then-discard semantics, OR an explicit comment
documenting why).
- **CDP auth gate is alive** — runtime SIGUSR1 attach via
`app.attachInspector()`, never Playwright's `_electron.launch()`
or `chromium.connectOverCDP()`.
- **BrowserWindow Proxy gotcha** — use `webContents.getAllWebContents()`
not `BrowserWindow.getAllWindows()`. Constructor-level wraps
don't work; use prototype-method hooks.
- **`skipUnlessRow()` always first.** First line of every `test()`
body when the test is row-gated.
- **No fixed sleeps.** `retryUntil` from `lib/retry.ts`, or
Playwright auto-wait. Fixed `sleep(N)` is a smell.
- **Diagnostics on every run.** `testInfo.attach()` the artefacts
(launcher log, --doctor output, frame extents, click attempts,
AX-tree snapshot). Captured as JSON dumps for multi-state tests.
- **Tag with annotations.** `severity:` and `surface:` on every
test so JUnit carries them through to matrix-regen.
- **Tabs in TS, ~80-char wrap as the existing files do.** Match
surrounding style.
- **Don't break existing runners.** `npm run typecheck` must stay
clean. H01-H05 are the canaries; `npm test` must still pass them
after every commit.
- **For mock-then-call: leading comment must document why mock
beats invoke** (T25's leading comment is the worked example —
three short paragraphs).
### Phases
#### Phase 0 — calibration
1. `cd tools/test-harness && npm run typecheck` — should pass.
2. Read the plan doc's "Status (post-execution)" session 2 section,
then read T25's leading comment (the mock-then-call pattern's
worked-example doc) and T16/T26 (the seedFromHost-then-AX-nav
pattern). Confirm you understand both.
3. Pick one Category B candidate (suggest T22 — read-only half) and
sketch the runner shape mentally. Don't write it yet — confirm
you can plan from the spec.
If Phase 0 surfaces a problem (typecheck failing, primitives
unclear, patterns not understood), stop and report. Don't fan out.
#### Phase 1 — fan-out batch
Spawn parallel subagents (cap at 6 in flight) for the highest-
confidence candidates first.
**Suggested initial batch (4-5 specs):**
- **B / T22 — PR monitoring read-only half.** seedFromHost +
`ipcMain._invokeHandlers` for `getPrChecks` + asar fingerprint
for the missing-`gh` warning string.
- **C / T24 (asar fingerprint OR mock form, pick one).** If you
want stronger coverage, go mock form (Category E shape — mirror
of T25's `installOpenExternalMock` helper). If you want a quick
Tier 1, go asar fingerprint (`Mtt` registry + scheme strings).
- **C / T30 — Auto-archive cadence constants.** Pure asar probe.
- **C / T33 — Plugin browser handler registered.** T38 pattern.
- **B / T35 — MCP server config picked up.** Fixture under
isolation configDir + inspector eval.
If those land cleanly, dispatch a second batch:
- **A / T31 — Side chat opens (handler-registered shape).**
seedFromHost + `ipcMain._invokeHandlers` for `startSideChat`
/ `sendSideChatMessage` / `stopSideChat`. The lighter probe.
- **A / T32 — Slash command menu (asar fingerprint).** The full
AX-tree form needs a Code-tab session open AND server-side
rendered menu — heavy. The fingerprint form: `getSupportedCommands`
+ `slashCommands` schema present in `index.js`.
- **A / S11 / S14 (only if Category D primitive is built first).**
Build `lib/input.ts:focusOtherWindow()` in PR 1; ship S11/S14
in PR 2.
- **B / T37 — CLAUDE.md memory loads.** Fixture file + inspector
eval against the loaded memory state.
#### Per-subagent prompt shape
```
You're implementing ONE test-harness runner for <TEST-ID> in
docs/testing/cases/<FILE>.md.
Read in order:
- docs/testing/cases/<FILE>.md (focus on <TEST-ID>'s Code anchors)
- tools/test-harness/README.md (conventions; status section names
the most-recent-template that fits)
- tools/test-harness/src/runners/<closest-template>.spec.ts
- tools/test-harness/src/lib/ (the primitives you'll reuse)
- CLAUDE.md (project conventions)
Write tools/test-harness/src/runners/<TEST-ID>_short_name.spec.ts.
[per-test specifics: pattern (seedFromHost / mock-then-call /
ipcMain._invokeHandlers / asar fingerprint / shared isolation),
assertion shape, skip rules, key constraint warnings]
Constraints:
- Tabs, ~80-char wrap.
- Use lib/* primitives; don't reinvent.
- testInfo.attach() the diagnostics from the spec's "Diagnostics
on failure" block.
- Tag with severity + surface annotations.
- No fixed sleeps. retryUntil or Playwright auto-wait.
- npm run typecheck must stay clean after your edits.
- Don't commit. The user reviews and commits.
If the test isn't reasonable to implement (anchors don't resolve
to anything assertable, the test depends on state you can't
construct, the existing primitives don't cover the surface), DO
NOT write a stub. Report under Open questions and stop. Sessions
1 and 2 had cumulative ~6 "stop and report" outcomes that were
the right call (S20 deferral, T05 reshape, T07 needs seedFromHost,
T08 needs setState('close'), S28 reclassification, T38 framing).
Report shape (~150 words):
## <TEST-ID> runner
- File written: tools/test-harness/src/runners/<filename>.spec.ts
- Layer: file probe | argv probe | L1 | L2 (xprop) | L2 (DBus) | pgrep
- Assertion shape: <one sentence>
- Skip rules: <which rows + why>
- Verification path: <typecheck + run result>
- Open questions: <caveats>
```
#### Phase 2 — synthesis
After fan-out returns:
1. `cd tools/test-harness && npm run typecheck` — must stay clean.
2. Run the new runners against KDE-W (the dev box) — but flag the
user first if any are destructive (seedFromHost kills running
Claude; T31/T32 require an open Code-tab session that may
accumulate state). Capture pass/skip/fail per spec for the
matrix.
3. Update [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
"Status (post-execution)" section to reflect newly-shipped
specs and any reclassifications discovered mid-flight.
4. Update [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
inventory table.
5. Write a final report listing:
- Specs landed (pass / skip / needs-tuning per row)
- Specs deferred (with the per-test rationale)
- Specs reclassified (Tier 3 → Tier 2, Tier 2 → blocked, etc.)
- Updated coverage stat (was 50/76 = 66%, now N/76 = M%)
6. Don't commit. The user reviews and commits.
7. Rotate this prompt: rewrite
`docs/testing/runner-implementation-followup-prompt.md` for the
NEXT session's deferred items.
### Self-correction loop
Same as sessions 1 and 2:
1. Subagent typecheck failure → re-spawn with explicit fix
instruction.
2. Subagent claims a runner exists but `git status` shows no new
file → re-spawn with explicit "use the Write tool" instruction.
3. Two subagents wrote runners that share a primitive but with
different shapes → factor into `lib/<topic>.ts` BEFORE shipping.
Cap re-spawns at 2 per file. Past that, mark as needing human
review and move on.
### Termination conditions
Stop and write the final report when one of:
1. **All Category A + B + C target specs landed and typecheck-clean.**
Write coverage update, stop.
2. **Hit re-spawn cap on 3+ runners.** Stop, write up which are
blocked.
3. **Discovered a primitive gap that breaks 5+ Tier 2/Tier 3
tests.** Stop, propose where the new primitive should live in
`lib/`. Future session adds the primitive first, then resumes.
4. **Session budget hits ~7 new specs.** Stop, synthesize, leave
the rest for the next session.
### What you should NOT do
- **Don't try to land Category A + B + C in one batch.** That's
~9-11 specs. Pick the highest-confidence subset for the first
batch and decide whether to do more based on what came back.
- **Don't ship stubs.** If a runner can't actually assert what the
spec says, mark it as Tier 3 / blocked / primitive-gap and don't
write a placeholder. The cumulative six "stop and report"
outcomes from sessions 1+2 were the right call — every one
revealed a real constraint.
- **Don't break existing runners.** H01-H05 are the canaries.
- **Don't pre-extract `lib/electron-mocks.ts`.** The
`installShowItemInFolderMock` + `installOpenDialogMock` pair
doesn't yet justify a new file; if T24 ships as Category E
(mock form), THAT's the third — extract then.
- **Don't restructure `lib/`** beyond targeted additions.
Premature abstractions are wrong abstractions.
- **Don't run destructive Tier 3 tests** that write to the user's
real claude.ai account (T22 PR write, T27 scheduling, T29
worktree creation, T34 OAuth, T36 hooks). Only the *read-only
reframes* of those are in scope this session.
- **Don't implement the #569 power-inhibit patch in this session.**
That's a separate workstream. The S20 spec follows the patch,
not the other way around.
- **Don't commit.** The user reviews and commits.
### Final report format
```markdown
## Runner implementation summary (session 3)
- Category A landed: N / 5
- Category B landed: N / 3
- Category C landed: N / 3
- Reclassified mid-flight: N (with reasons)
- Coverage: was 50/76 (66%), now <NEW>/76 (<PCT>%)
- Typecheck: clean | <errors>
- KDE-W test run: <pass/skip/fail counts>
## Per-spec breakdown
| Cat | Test ID | File | Assertion shape | Status |
|---|---|---|---|---|
| B | T22 | T22_pr_monitoring_handler.spec.ts | seedFromHost + IPC handler probe + asar fingerprint | ✓ pass |
| ... |
## Notable findings
- ...
## Open questions
- ...
## Files touched
git status output (only tools/test-harness/src/runners/*.spec.ts +
maybe lib/* primitives if extraction was needed; possibly plan-doc /
README updates).
## Diff summary
git diff --stat
```
### Operational notes
- Subagents are launched in parallel via a single message with
multiple Agent tool calls. Don't serialise.
- Each subagent's Write calls land directly in the working tree.
- The grounding probe (`tools/test-harness/grounding-probe.ts`) can
help when implementing a runner that asserts runtime API state —
capture once with `npm run grounding-probe -- --launch
--include-synthetic`, grep the output for the IPC channel /
accelerator / API your runner needs to assert against.
- For seedFromHost specs, the host MUST have a signed-in Claude
Desktop. The primitive throws with a clear message if not.
Document the prerequisite in your runner's leading comment if
it's the first one to add seedFromHost coverage to a new surface.
- For tests that touch the AX tree, `claudeai.ts` page-objects are
the right substrate — see `T17_folder_picker.spec.ts` for the
end-to-end example. Don't query DOM by CSS selector unless
`claudeai.ts` doesn't already cover the surface.
- For mock-then-call: see T25 for the canonical pattern. Mock
installation is in `lib/claudeai.ts` alongside the dialog-mock;
add a sibling export, don't pre-extract a new file.
Begin with Phase 0. Don't fan out until calibration succeeds.

View File

@@ -18,11 +18,60 @@ work begins.
## Status (post-execution)
**Shipped this session (25 new specs):** T02, T05, T06, T07, T08, T09,
**Shipped session 2 (10 new specs):** T10, T16, T23, T25, T26, T38, S10,
S19, S25, S28. Coverage moved from 40/76 (53%) to 50/76 (66%).
Session 2 reclassifications:
- **S28** reclassified from **Tier 2 → Tier 1**. The plan called for
inspector-eval against `Sbn()` permission classifier with a synthetic
error string, but `Sbn` is a closure-local in the bundled main process
— not reachable from `globalThis` and no IPC surface exposes it. The
shipped runner is a single-regex asar fingerprint that pins the same
classifier expression (the `"Permission denied" || "Access is denied"
|| "could not lock config file" → "permission-denied"` chain) plus the
`Failed to create git worktree:` log line. Same drift signal, no
launch needed.
- **T38** shipped as a **handler-registered probe**, not a no-throw
invocation. Calling `LocalSessions.openInEditor` directly would
terminate at `shell.openExternal('vscode://...')` (real editor launch
+ side effect on host) and would also be blocked by the channel's
origin validation against non-claude.ai senders. Instead, the runner
inspects `ipcMain._invokeHandlers` for the channel ending in
`LocalSessions_$_openInEditor`. Documents the channel's
`$eipc_message$_<UUID>_$_claude.web_$_<name>` framing (UUID is
build-stable: `c0eed8c9-...`) so a future framing change is visible.
- **T23 tool choice — dbus-monitor, not gdbus monitor.** The plan
suggested `gdbus monitor --session --dest=org.freedesktop.Notifications`
but `gdbus monitor --dest <name>` only sees signals OWNED BY that
destination, not method calls TO it. `Notify` is a method call FROM
Electron TO the daemon, so `gdbus` cannot observe it. Switched to
`dbus-monitor` (eavesdrop match rule). Pre-launch checks gate on
`dbus-monitor` presence + notification-daemon ownership of
`org.freedesktop.Notifications`.
- **S19 honest-stub note.** The Tier 2 slice is env propagation
(`extraEnv: { CLAUDE_CONFIG_DIR }` → main-process `process.env`).
Half 1 (env propagation) is the load-bearing assertion. Half 2
(resolver-shape echo) is a synthetic re-implementation of `cE()` /
`Tce()` because those minified symbols are closure-locals — leading
comment is explicit about the re-implementation status.
- **T25 / T38 / T23 host side effects documented.** T25 may briefly
open a file manager on the host; T38 deliberately doesn't invoke;
T23 fires a real notification. Each runner's leading comment
documents the side effect.
Tier 2 → Tier 2 candidates remaining for a future session: **T31, T32**
(side chat, slash menu — both need a Code-tab session OPEN, not just
the Code tab loaded). **S11, S14** (focus-shifter primitive still
unbuilt — flagged in plan and unchanged).
---
**Shipped session 1 (25 new specs):** T02, T05, T06, T07, T08, T09,
T11, T12, T13, T14a, T14b, S01, S02, S03, S04, S05, S07, S08, S15, S16,
S17, S21, S22, S26, S27. Coverage moved from 15/76 (20%) to 40/76 (53%).
Reclassifications discovered during execution:
Session 1 reclassifications:
- **T05** shipped as a **Tier 3 delivery probe** (xdg-open →
`app.on('second-instance', ...)`), not the originally-planned Tier 2

View File

@@ -1,376 +0,0 @@
# test-harness runner implementation — implementation prompt
This file is meant to be **copied verbatim into a fresh Claude Code
session** as the initial user message. Don't paraphrase it; the
orchestration depends on the exact directives below.
---
## Prompt to paste
You're picking up after the cases-grounding sweep, the
`grounding-probe.ts` runtime probe, and the `CLAUDE_HARNESS_USE_WAYLAND`
flag all landed. The case docs are now anchored to upstream code +
wrapper scripts and the harness can swap backends with one env var.
The next workstream is **wiring runners for the 61 of 76 tests that
still don't have one**.
Today the harness has 15 specs covering 4 of 39 cross-env tests (T01,
T03, T04, T17) and 11 of 37 env-specific tests (S09, S12, S29-S37
Quick Entry sweep) — plus 5 H-prefix harness self-tests. Everything
else is human-execution-only. The smoke set in
[`docs/testing/README.md`](docs/testing/README.md#smoke-set) lists T01,
T03, T04, T05, T07, T08, T11, T15, T16, T17 as the release gate; only
4 of those are wired. The rest of the gate goes through a manual
clicker.
This is too many to land in one session — don't try. Triage first,
land the cheap probes, get the single-launch wins, defer the rest
with explicit reasons. Optimise for **broad coverage of cheap
checks** over deep coverage of one renderer-heavy surface.
### Authoritative reference
Read these in order before fanning out:
- `docs/testing/cases/README.md` — case-doc structure and the four
anchor scopes (upstream / wrapper / SPA / CLI). Each test you wire
needs to either consume an existing anchor or surface a new one.
- `tools/test-harness/README.md` — runner conventions, the 5 distinct
shapes of TS code already in `lib/` (xprop / dbus-next / inspector
attach / `app.asar` reads / `/proc/$pid/cmdline` reads / pgrep), the
isolation defaults, the CDP-gate workaround.
- `tools/test-harness/src/lib/` — the existing primitives. Most new
runners are recombinations:
- `electron.ts``launchClaude()`, `app.attachInspector()`,
`app.waitForReady('window'|'mainVisible'|'claudeAi'|'userLoaded')`
- `inspector.ts``evalInMain()`, `evalInRenderer()`,
`getAccessibleTree()`, `clickByBackendNodeId()`
- `claudeai.ts` — page-objects against the AX-tree substrate
- `quickentry.ts` — shared QE primitives (`installInterceptor`,
`openAndWaitReady`)
- `sni.ts` — DBus StatusNotifierWatcher attribution by pid
- `wm.ts` — xprop wrappers; `findX11WindowByPid`,
`getNetFrameExtents`
- `argv.ts``/proc/$pid/cmdline` flag check
- `asar.ts` — in-place `app.asar` content reads (no temp extract)
- `row.ts``skipUnlessRow()` / `skipOnRow()`
- `diagnostics.ts` — launcher log + `--doctor` capture
- `tools/test-harness/src/runners/` — every existing spec is a
template. Match the layer (L1 / L2 / file / argv / pid) to the
test's assertion shape. Don't reinvent — `T17_folder_picker.spec.ts`
for L1 click chains, `T03_tray_icon_present.spec.ts` for DBus,
`T04_window_decorations.spec.ts` for xprop, `S33_electron_version_capture.spec.ts`
for asar reads, `S12_global_shortcuts_portal_flag.spec.ts` for argv.
- `docs/testing/cases/*.md` — the spec each runner asserts. The
**Code anchors:** field tells you exactly where upstream implements
the feature.
- `CLAUDE.md` (project root) — code style, attribution, commit format.
### Tests in scope
The 61 missing runners, by source file:
| File | Missing |
|---|---|
| `launch.md` | T02, T13, T14 |
| `tray-and-window-chrome.md` | T07, T08, S08, S13 |
| `shortcuts-and-input.md` | T05, T06, S06, S07, S10, S11, S14 |
| `code-tab-foundations.md` | T15, T16, T18, T19, T20 |
| `code-tab-workflow.md` | T21, T22, T29, T30, T31, T32 |
| `code-tab-handoff.md` | T23, T24, T25, T34, T38, T39 |
| `routines.md` | T26, T27, T28, S19, S20, S21 |
| `extensibility.md` | T11, T33, T35, T36, T37, S27, S28 |
| `distribution.md` | S01, S02, S03, S04, S05, S15, S16, S26 |
| `platform-integration.md` | T09, T10, T12, S17, S18, S22, S23, S24, S25 |
### Why this iteration
The matrix today is mostly hand-driven. Each cell update requires a
human running through the case-doc steps on a VM and writing a status
into `matrix.md`. That's expensive enough that sweeps happen
infrequently and regressions sit longer than they should. A wider
runner net means more cells get refreshed automatically per release,
the human time goes to the renderer-heavy tests that genuinely need
it, and the Smoke + Critical gate stops being a manual-clicker
bottleneck on every tag.
This isn't one-session work — it's the start of a series. Optimise
for **landing what's tractable in this session and clearly handing
off the rest** with concrete next-steps per deferred test.
### Constraints to respect (don't violate)
- **Default isolation.** Every `launchClaude()` gets a fresh
`XDG_CONFIG_HOME` sandbox, cleaned on `close()`. Tests that need a
signed-in claude.ai must opt out via `launchClaude({ isolation: null })`
AND gate themselves on `CLAUDE_TEST_USE_HOST_CONFIG=1`. Never write
to the user's real config without that gate.
- **CDP auth gate is alive.** `_electron.launch()` and
`chromium.connectOverCDP()` both inject `--remote-debugging-port`
which exits the app. The harness uses `SIGUSR1` runtime-attach via
`app.attachInspector()` — same code path as Developer → Enable Main
Process Debugger. Don't try to use Playwright's electron launcher.
- **BrowserWindow Proxy gotcha.** `frame-fix-wrapper.js` returns the
electron module wrapped in a Proxy. `electron.BrowserWindow.getAllWindows()`
returns 0. Use `webContents.getAllWebContents()` instead. Constructor-
level wraps don't take — use prototype-method hooks (see
`docs/learnings/test-harness-electron-hooks.md`).
- **`skipUnlessRow()` always first.** First line of every `test()`
body. JUnit `<skipped>` → matrix `-`, never `✗` for an
inapplicable row.
- **No fixed sleeps.** `retryUntil` from `lib/retry.ts`, or
Playwright's auto-wait. Fixed `sleep(N)` is a smell.
- **Diagnostics on every run** (Decision 7), not just failures.
`testInfo.attach()` the launcher log, `--doctor` output, frame
extents, click attempts, etc. Captured as JSON dumps for
multi-state tests (S31 pattern).
- **Tag with annotations.** `severity:` and `surface:` annotations on
every test so JUnit carries them through to matrix-regen.
- **Tabs in TS, ~80-char wrap as the existing files do.** Match the
surrounding style.
- **Don't break existing runners.** `npm run typecheck` must stay
clean. Run the full `npm test` against KDE-W if you have access;
if not, document the verification path in your report.
### Phases
#### Phase 0 — calibration
1. `cd tools/test-harness && npm run typecheck` — should pass; if
not, stop and report.
2. Read `tools/test-harness/README.md` end-to-end and one full runner
(suggest `T04_window_decorations.spec.ts` — small, file probe
adjacent, easy to follow). Confirm you understand the spec
contract before fanning out.
3. Pick T02 (doctor exit code) as a calibration runner. Read
`docs/testing/cases/launch.md` T02, find the existing anchors in
`scripts/doctor.sh`, sketch a runner in your head: spawn
`claude-desktop --doctor`, assert `exit code === 0`, attach the
stdout. ~30 lines. Don't write it yet — just confirm you can plan
the shape from the spec.
If Phase 0 surfaces a problem (typecheck failing, primitives unclear,
spec contract not understood), stop and report. Don't fan out
subagents against an unverified workflow.
#### Phase 1 — triage
Spawn ONE subagent (`subagent_type: 'general-purpose'`) that reads
every case file in scope and produces a **tiered runner-implementation
plan**. Plan output goes to
`docs/testing/runner-implementation-plan.md` (new file) with this
shape:
```markdown
## Tier 1 — File / spawn / argv probes (~30min-1hr each)
No app launch needed (or single short-lived spawn). Existing
primitives suffice.
- T02 — `claude-desktop --doctor` exit-code probe. Reuse: lib/diagnostics.ts.
- T13 — same as T02 but parse the package-format line.
- S01 — DEB control file `Depends:` field empty (read post-build).
- S02 — distro substring matching in launcher (asar/script grep).
- ...
## Tier 2 — Single-launch probes (~2-3hrs each)
One launchClaude() + inspector attach. Existing primitives suffice.
- T05 — URL handler `claude://` registered. Verify via xdg-mime + spawn.
- T06 — Quick Entry shortcut registered. globalShortcut.isRegistered().
- ...
## Tier 3 — Multi-step or login-required (~4-8hrs each)
Need claude.ai signed in (CLAUDE_TEST_USE_HOST_CONFIG=1) or multi-
launch state. Defer to follow-up sessions.
- T15 — sign-in flow. Needs OAuth provider mock or live login.
- T16 — Code tab loads. Needs login.
- ...
## Tier 4 — Out of scope or blocked
- T39 — /desktop is the CLI `claude` binary, not the Electron asar.
- S26 — gated on #567 (autoUpdater no-op patch).
- T11, T33 — plugin install needs real plugin + network.
- ...
```
The triage subagent's job is **only** to classify each test into a
tier and record reasoning. No runner code yet.
#### Phase 2 — Tier 1 fan-out
Once the plan is in `docs/testing/runner-implementation-plan.md`, spawn
**one subagent per Tier 1 test** in parallel. Per-subagent prompt:
```
You're implementing ONE test-harness runner for <TEST-ID> in
docs/testing/cases/<FILE>.md.
Read in order:
- docs/testing/cases/<FILE>.md (the spec — focus on the <TEST-ID>
section, including its Code anchors)
- tools/test-harness/README.md (conventions)
- tools/test-harness/src/runners/<closest-existing-template>.spec.ts
- CLAUDE.md (project conventions)
Write tools/test-harness/src/runners/<TEST-ID>_short_name.spec.ts.
Match the closest template. First line of the test body:
skipUnlessRow(testInfo, ['<rows-this-applies-to>']) per the spec's
Applies to field.
Constraints:
- Tabs, ~80-char wrap.
- Use lib/* primitives; don't reinvent.
- testInfo.attach() the diagnostics from the spec's Diagnostics on
failure block.
- Tag with severity + surface annotations.
- No fixed sleeps. retryUntil or Playwright auto-wait.
- npm run typecheck must stay clean after your edits.
- Don't commit. The user reviews and commits.
If the test isn't reasonable to implement (the case anchors don't
resolve to anything assertable, the test depends on state you can't
construct, the existing primitives don't cover the surface), DO NOT
write a stub. Report under Open questions and stop.
Report shape (~150 words):
## <TEST-ID> runner
- File written: tools/test-harness/src/runners/<filename>.spec.ts
- Layer: file probe | argv probe | L1 | L2 (xprop) | L2 (DBus) | pgrep
- Assertion shape: <what the test asserts in one sentence>
- Skip rules: <which rows are skipped + why>
- Verification path: <how the user verifies this runs cleanly>
- Open questions: <any caveats>
```
Send Tier 1 subagents in parallel — they're independent, each owns
one runner file. Cap at ~10 subagents at a time to keep the
fan-out manageable.
#### Phase 3 — Tier 2 fan-out
Same shape as Phase 2 but for Tier 2 tests. Single-launch probes are
heavier (each subagent does a full launchClaude() + assertion design)
so cap at ~6 subagents in parallel.
If a Tier 2 subagent hits a blocker that pushes the test into Tier 3
mid-implementation, stop the subagent — don't ship a stub. The
synthesis step will reclassify.
#### Phase 4 — synthesis
Once Tier 1 + Tier 2 land:
1. `cd tools/test-harness && npm run typecheck` — must be clean.
2. Run the new runners against KDE-W if you have access:
`ROW=KDE-W npx playwright test src/runners/T02_*.spec.ts ...`
capture which pass cleanly and which need selector tuning.
3. Write a final report at the end of the session listing:
- Runners landed (pass / skip / needs-tuning per row)
- Tier 3 deferred (with the per-test rationale from the plan)
- Tier 4 out-of-scope (with reasons)
- Updated coverage stat (was 15/76 = 20%, now N/76 = M%)
4. Don't commit. The user reviews and commits.
### Self-correction loop
After Phase 2 / Phase 3 returns:
1. If a subagent's runner fails typecheck, re-spawn with explicit
instruction to read the typecheck error and fix it (often a missing
import or stale type from a renamed primitive).
2. If a subagent claimed a runner exists but `git status` shows no
new file, the subagent silently dropped the write — re-spawn with
explicit "use the Write tool" instruction.
3. If two subagents wrote runners that share a primitive but with
slightly different shapes, refactor the duplication into
`lib/<topic>.ts` BEFORE shipping — duplication compounds across the
next 50 runners.
Cap re-spawns at 2 per file. Past that, mark the runner as needing
human review in the final report and move on.
### Termination conditions
Stop and write the final report when one of:
1. **Tier 1 + Tier 2 all landed and typecheck-clean.** Write the
coverage update, stop. Future sessions handle Tier 3.
2. **Hit re-spawn cap on 3+ runners.** Stop, write up which runners
are blocked and what each blocker looks like.
3. **Discovered a primitive gap that breaks 5+ Tier 2 tests.** Stop,
write up the gap, propose where the new primitive should live in
`lib/`. Future session adds the primitive first, then resumes
Tier 2.
### What you should NOT do
- **Don't try to land Tier 3 / Tier 4 in this session.** They're
deferred for documented reasons. If you find one that turns out
cheaper than the plan estimated, note it and pull it forward to
Tier 2 — but don't open new can of worms when there are still
Tier 1 wins on the table.
- **Don't ship stubs.** If a runner can't actually assert what the
spec says, mark it as Tier 3 in the plan and don't write a placeholder.
- **Don't break existing runners.** `H01-H05` are the canaries.
`npm test` must still pass H01-H05 after every Tier 1/2 commit.
- **Don't restructure lib/.** Add primitives only when 2+ runners
need them. Premature shared abstractions will be wrong.
- **Don't run the host Claude Desktop in destructive ways.** Tests
that need login should gate on `CLAUDE_TEST_USE_HOST_CONFIG=1`.
- **Don't commit.** The user reviews and commits.
### Final report format
```markdown
## Runner implementation summary
- Tier 1 landed: N / M
- Tier 2 landed: N / M
- Tier 3 deferred: N (see plan)
- Tier 4 out-of-scope: N (see plan)
- Coverage: was 15/76 (20%), now <NEW>/76 (<PCT>%)
- Typecheck: clean | <errors>
- KDE-W test run: <pass/skip/fail counts>
## Per-tier breakdown
| Tier | Test ID | File | Assertion shape | Status |
|---|---|---|---|---|
| 1 | T02 | T02_doctor_exit_code.spec.ts | spawn + exit code | ✓ pass |
| 1 | S01 | S01_deb_control_no_depends.spec.ts | file probe | ✓ pass |
| 2 | T05 | T05_url_handler_claude_scheme.spec.ts | xdg-mime + spawn | ✓ pass |
| 2 | T06 | T06_quick_entry_shortcut.spec.ts | globalShortcut | ⏳ needs ydotool |
| ...
## Notable findings
- ...
## Open questions
- ...
## Files touched
git status output (only tools/test-harness/src/runners/*.spec.ts and
docs/testing/runner-implementation-plan.md should appear; possibly
new lib/ primitives if extraction was needed).
## Diff summary
git diff --stat
```
### Operational notes
- Subagents are launched in parallel via a single message with
multiple Agent tool calls. Don't serialise.
- Each subagent's Write calls land directly in the working tree.
Each owns one runner file — no merge conflicts.
- The grounding probe (`tools/test-harness/grounding-probe.ts`) can
help when implementing a runner that asserts runtime API state —
capture once with `npm run grounding-probe -- --launch
--include-synthetic`, grep the output for the IPC channel /
accelerator / API your runner needs to assert against.
- For tests that touch the AX tree, `claudeai.ts` page-objects are
the right substrate — see `T17_folder_picker.spec.ts` for an end-
to-end example. Don't query DOM by CSS selector unless `claudeai.ts`
doesn't already cover the surface.
Begin with Phase 0. Don't fan out until calibration succeeds.

View File

@@ -7,7 +7,7 @@ architecture, decisions, and rationale.
## Status
Forty specs wired (8 cross-env T-tests, 27 env-specific S-tests, 5
Fifty specs wired (14 cross-env T-tests, 31 env-specific S-tests, 5
H-prefix harness self-tests). See
[`docs/testing/runner-implementation-plan.md`](../../docs/testing/runner-implementation-plan.md)
for the tiered triage of remaining tests and the per-spec rationale
@@ -24,12 +24,18 @@ behind tier classification.
| [T07](../../docs/testing/cases/tray-and-window-chrome.md#t07--in-app-topbar) | Five topbar buttons render with non-zero rects (uses `seedFromHost` for hermetic auth) | L1 + DOM |
| [T08](../../docs/testing/cases/tray-and-window-chrome.md#t08--close-x-hides-to-tray) | `win.close()` fires the wrapper interceptor; window hidden, proc alive | L1 |
| [T09](../../docs/testing/cases/platform-integration.md#t09--autostart-via-xdg) | `setLoginItemSettings({ openAtLogin })` writes/removes `$XDG_CONFIG_HOME/autostart/claude-desktop.desktop` | L1 + filesystem |
| [T10](../../docs/testing/cases/platform-integration.md#t10--cowork-integration) | After H04-style spawn detection, `kill -9` the daemon and confirm a *different* pid respawns within ~20s (Patch 6 cooldown + retry) | pgrep delta + spawn delta |
| [T11](../../docs/testing/cases/extensibility.md#t11--plugin-install) | Plugin-install code path fingerprints present in bundled `index.js` | file probe |
| [T12](../../docs/testing/cases/platform-integration.md#t12--webgl-warn-only) | `app.getGPUFeatureStatus()` returns a populated object; renderer reached visible | L1 |
| [T13](../../docs/testing/cases/launch.md#t13--doctor-reports-correct-package-format) | `--doctor` does not false-flag rpm/deb installs as missing-dpkg AppImage | spawn + stdout grep |
| [T14a](../../docs/testing/cases/launch.md#t14--multi-instance-behavior) | `requestSingleInstanceLock` + `'second-instance'` strings in bundled `index.js` (file probe) | file probe |
| [T14b](../../docs/testing/cases/launch.md#t14--multi-instance-behavior) | Second invocation under same isolation exits cleanly; primary pid stays alive (runtime probe) | spawn delta + pgrep |
| [T16](../../docs/testing/cases/code-tab-foundations.md#t16--code-tab-loads) | After `seedFromHost` + `userLoaded`, `CodeTab.activate()` resolves and ≥1 compact pill renders (env pill = Code-body mounted) | L1 + AX-tree |
| [T17](../../docs/testing/cases/code-tab-foundations.md#t17--folder-picker-opens) | Code df-pill → env pill → Local → Select folder → Open folder triggers `dialog.showOpenDialog` (requires `CLAUDE_TEST_USE_HOST_CONFIG=1`) | L1 |
| [T23](../../docs/testing/cases/code-tab-handoff.md#t23--desktop-notifications-fire) | Firing `new Notification({title})` from main reaches the session bus's `org.freedesktop.Notifications.Notify` (observed via `dbus-monitor`) | L1 + DBus subprocess |
| [T25](../../docs/testing/cases/code-tab-handoff.md#t25--show-in-files--file-manager) | After `installShowItemInFolderMock` mirroring T17's dialog-mock pattern, `evalInMain` calls `shell.showItemInFolder(<synthetic path>)`; mock records the call verbatim, no throw — no host side effect | L1 (mocked egress) |
| [T26](../../docs/testing/cases/routines.md#t26--routines-page-renders) | After `seedFromHost` + `userLoaded`, click "Routines" sidebar AX button; assert "New routine" / "All" / "Calendar" anchor renders | L1 + AX-tree |
| [T38](../../docs/testing/cases/code-tab-handoff.md#t38--continue-in-ide) | `ipcMain._invokeHandlers` registry contains a channel ending in `LocalSessions_$_openInEditor` (handler-registered probe) | L1 (IPC introspection) |
| H01 | CDP auth gate exits with code 1 when spawned with `--remote-debugging-port` and no `CLAUDE_CDP_AUTH` token | spawn probe |
| H02 | `frame-fix-wrapper.js` + `frame-fix-entry.js` injected into `app.asar` (Proxy + main-field reference) | file probe |
| H03 | Build-pipeline patch fingerprints all present in `app.asar` (KDE gate, frame-fix inject, tray, cowork, claude-code) | file probe |
@@ -43,14 +49,18 @@ behind tier classification.
| [S07](../../docs/testing/cases/shortcuts-and-input.md#s07--claude_use_waylandvar) | Under `CLAUDE_HARNESS_USE_WAYLAND=1`, spawned Electron has `--ozone-platform=wayland` on argv | argv probe |
| [S08](../../docs/testing/cases/tray-and-window-chrome.md#s08--tray-icon-doesnt-duplicate-after-nativetheme-update) | `setImage`-based in-place fast-path injected by `tray.sh` (KDE-only, file probe) | file probe |
| [S09](../../docs/testing/cases/shortcuts-and-input.md#s09--quick-window-patch-runs-only-on-kde-post-406-gate) | KDE-gate string present in bundled `index.js` (patch ran at build) | file probe |
| [S10](../../docs/testing/cases/shortcuts-and-input.md#s10--quick-entry-popup-is-transparent-no-opaque-square-frame) | KDE-W only — popup runtime `getBackgroundColor() === '#00000000'` after Quick Entry opens (regression-detector against electron#50213 if bundled Electron in 41.0.4-bisect-window) | L1 + ydotool |
| S12 | `--enable-features=GlobalShortcutsPortal` in Electron argv (GNOME-W only — currently a known-failing regression detector) | argv probe |
| [S15](../../docs/testing/cases/distribution.md#s15--appimage-extraction---appimage-extract-works-as-documented-fallback) | `--appimage-extract` exits 0; `squashfs-root/AppRun --version` runs without FUSE error | spawn + filesystem |
| [S16](../../docs/testing/cases/distribution.md#s16--appimage-mount-cleans-up-on-app-exit) | `mount(8)` shows new `.mount_claude` while app is up; gone within 10s of close | mount delta |
| [S17](../../docs/testing/cases/platform-integration.md#s17--app-launched-from-desktop-inherits-shell-path) | Shell-path-worker overlays user's login-shell PATH onto a deliberately-scrubbed env | L1 + utilityProcess |
| [S19](../../docs/testing/cases/routines.md#s19--claude_config_dir-redirects-scheduled-task-storage) | `extraEnv: { CLAUDE_CONFIG_DIR }` reaches main-process `process.env`; `cE()`-equivalent resolves under the override path | L1 + extraEnv |
| [S21](../../docs/testing/cases/routines.md#s21--lid-close-still-suspends-per-os-policy) | No `handle-lid-switch` / `HandleLidSwitch` strings in bundle (lid policy deferred to OS) | asar absence probe |
| [S22](../../docs/testing/cases/platform-integration.md#s22--computer-use-toggle-absent-or-visibly-disabled-on-linux) | `new Set(["darwin","win32"])` platform gate present; no 2-element Set pairing linux (file-probe form) | asar regex |
| [S25](../../docs/testing/cases/platform-integration.md#s25--mobile-pairing-survives-linux-session-restart) | `safeStorage.encryptString → file → app restart → file → safeStorage.decryptString` round-trips the same plaintext (skips when `isEncryptionAvailable === false`) | L1 + shared isolation handle |
| [S26](../../docs/testing/cases/distribution.md#s26--auto-update-is-disabled-when-installed-via-aptdnf) | `setFeedURL` present + project suppression marker present (currently fails — gated on #567) | asar fingerprint |
| [S27](../../docs/testing/cases/extensibility.md#s27--plugins-install-per-user) | `installed_plugins.json` + homedir resolver present; no `*/plugins` system paths in bundle | asar fingerprint |
| [S28](../../docs/testing/cases/extensibility.md#s28--worktree-creation-surfaces-clear-error-on-read-only-mounts) | Bundled `index.js` contains the worktree permission classifier expression (`"Permission denied" \|\| "Access is denied" \|\| "could not lock config file" → "permission-denied"`) plus the `Failed to create git worktree:` log line | asar fingerprint |
| [S29](../../docs/testing/cases/shortcuts-and-input.md#s29--quick-entry-popup-is-created-lazily-on-first-shortcut-press-closed-to-tray-sanity) | Popup opens when main is hidden-to-tray (lazy-create sanity) | L1 |
| [S30](../../docs/testing/cases/shortcuts-and-input.md#s30--quick-entry-shortcut-becomes-a-no-op-after-full-app-exit) | No new claude-desktop pid spawns after post-exit shortcut press | pgrep delta + ydotool |
| [S31](../../docs/testing/cases/shortcuts-and-input.md#s31--quick-entry-submit-makes-the-new-chat-reachable-from-any-main-window-state) | Submit reaches new chat from visible / minimized / hidden-to-tray (QE-7/8/9) | L1 + ydotool |
@@ -62,15 +72,19 @@ behind tier classification.
| S37 | Main-window destroy unreachable on Linux per close-to-tray override — documented skip | — |
These specs exercise the substrate primitives in `lib/`: `xprop`
shell-outs (T01, T04), `dbus-next` (T03), Node-inspector runtime-attach
(T07/T17/S29-S35/T05-T14b L1 specs), `app.asar` content reads
(S08/S09/S21/S22/S26/S27/T11/T14a/H02/H03/S33), `/proc/$pid/cmdline`
reads (S07/S12), pgrep-based pid deltas (T14b/H04/S16/S30), `mount(8)`
parsing (S16), source-tree probes against `scripts/launcher-common.sh`
(S02), `dpkg-query` / `rpm -qR` / `rpm -qf` calls (S03/S04/S05/T13),
and the new `createIsolation({ seedFromHost: true })` primitive that
lets login-required tests run hermetically against a copy of the
host's signed-in auth state (T07).
shell-outs (T01, T04), `dbus-next` (T03), `dbus-monitor` subprocess
eavesdrop (T23), Node-inspector runtime-attach
(T07/T16/T17/T26/S10/S29-S35/T05-T14b L1 specs), `app.asar` content reads
(S08/S09/S21/S22/S26/S27/S28/T11/T14a/H02/H03/S33), `/proc/$pid/cmdline`
reads (S07/S12), pgrep-based pid deltas (T10/T14b/H04/S16/S30),
`mount(8)` parsing (S16), source-tree probes against
`scripts/launcher-common.sh` (S02), `dpkg-query` / `rpm -qR` / `rpm -qf`
calls (S03/S04/S05/T13), `safeStorage.encryptString` round-trip across
two launches (S25), `extraEnv` precedence over isolation env (S19),
`ipcMain._invokeHandlers` registry introspection (T38), and the
`createIsolation({ seedFromHost: true })` primitive that lets
login-required tests run hermetically against a copy of the host's
signed-in auth state (T07, T16, T26).
Per-row pass/skip counts depend on which sweep runs against the row;
see `runner-implementation-plan.md` for tier classification and