docs(testing): session 2 plan/inventory + rotate session 3 prompt

- runner-implementation-plan.md: new "Status (post-execution)" sub- section for session 2 listing the 10 new specs and the four reclassification notes (S28 → Tier 1, T38 framing, T23 tool choice, S19 honest-stub note). Session 1 sub-section preserved verbatim below for comparison. - README.md: 50-spec inventory (was 40), new T-rows (T10, T16, T23, T25, T26, T38) and S-rows (S10, S19, S25, S28) interleaved into the existing tables. Substrate-primitives paragraph extended with dbus-monitor, mock-then-call, ipcMain registry introspection, safeStorage round-trip, extraEnv precedence. - runner-implementation-followup-prompt.md: rewritten for session 3 — deferred items (T31, T32, S06, S11, S14), Tier 3 → Tier 2 reframes (T22, T35, T37), asar fingerprint cleanups (T24, T30, T33), the focus-shifter primitive build, and the mock-then-call extension for T24 as an alternative to its asar form. Includes the "known mechanism-recipe table" cumulating sessions 1+2. - runner-implementation-prompt.md: deleted (session 1's prompt, superseded by the followup that's been the rolling document since session 1 ended). Co-Authored-By: Claude <claude@anthropic.com>
2026-05-17 00:26:21 +03:00 · 2026-05-03 17:01:55 -04:00
parent fb5189fe45
commit 86385848d0
4 changed files with 596 additions and 388 deletions
--- a/docs/testing/runner-implementation-followup-prompt.md
+++ b/docs/testing/runner-implementation-followup-prompt.md
@@ -0,0 +1,521 @@
+# test-harness runner implementation — session 3 prompt
+
+This file is meant to be **copied verbatim into a fresh Claude Code
+session** as the initial user message. Don't paraphrase it; the
+orchestration depends on the exact directives below.
+
+You're picking up after a runner-implementation session that landed 10
+new specs (5 Tier 2 + 4 Tier 2-reframes + 1 Tier 1 reclass), lifting
+harness coverage from 40/76 (53%) to 50/76 (66%). Three commits on
+`docs/compat-matrix`:
+
+- `XXX` — `test(harness): session 2 runners + lib/claudeai mock helper`
+  (10 new spec files; `installShowItemInFolderMock` added to
+  `lib/claudeai.ts` mirroring the `installOpenDialogMock` pattern;
+  README inventory + plan-doc status section updated).
+
+(Substitute the actual SHA after committing — the user reviews and
+commits at the end of every session.)
+
+The plan doc at
+[`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
+captures the tier classification and execution-time reclassifications.
+Its "Status (post-execution)" section is the source of truth for what's
+done and what's deferred — read **session 2** then **session 1**
+sub-sections.
+
+This session is a continuation, not a restart. Start by reading the
+plan doc's status sections.
+
+### Big new findings from session 2
+
+1. **Mock-then-call beats invoke-then-cleanup for Tier 2 reframes of
+   side-effecting Electron APIs.** T17's existing
+   `installOpenDialogMock` pattern was extended to T25 via a new
+   `installShowItemInFolderMock` in `lib/claudeai.ts`. Net: no host
+   file-manager pop-up during the run, AND the assertion strengthens
+   from "didn't throw" to "the egress was reached + the path arg
+   flowed through verbatim". Apply this pattern to any future
+   `shell.*` / `dialog.*` Tier 2 reframes (T24 `shell.openExternal`
+   would mock cleanly the same way).
+
+2. **`gdbus monitor --dest <name>` only sees signals OWNED BY that
+   destination, not method calls TO it.** T23 had to switch from the
+   plan's gdbus suggestion to `dbus-monitor` (eavesdrop match rule)
+   to observe `org.freedesktop.Notifications.Notify` calls from
+   Electron. If T27 / T22 / S24 ever ship Tier 2 reframes that need
+   to observe method calls on a service, use `dbus-monitor`.
+
+3. **`ipcMain._invokeHandlers` channel naming carries a build-stable
+   UUID prefix:** `$eipc_message$_<UUID>_$_claude.web_$_<name>`. T38
+   anchors on the `_$_<name>` suffix to survive UUID rotation; the
+   prefix is captured as diagnostic. Useful precedent for any future
+   IPC-introspection probes.
+
+4. **Closure-local minified helpers are NOT reachable from
+   globalThis.** S28's plan called for inspector-eval against
+   `Sbn()`, but `Sbn` is a closure-local — couldn't be invoked. S28
+   reclassified to Tier 1 (asar fingerprint of the classifier
+   expression). For any future "Tier 2 reframe via inspector-eval
+   against minified helper X" entry: confirm reachability before
+   classifying.
+
+5. **`safeStorage` on Linux uses random IVs.** Only decrypted
+   plaintext is comparable across encrypt calls; ciphertext bytes
+   are not deterministic. S25 compares plaintexts.
+
+6. **`extraEnv` precedence.** `lib/electron.ts:317-323` spreads in
+   order: `process.env`, `LAUNCHER_INJECTED_ENV`, `isolation?.env`,
+   `waylandEnv`, then `opts.extraEnv`, then `CI: '1'`. Override
+   wins. S19 leans on this; load-bearing for future tests that
+   need to override isolation defaults.
+
+### Authoritative reference
+
+Read these in order before fanning out:
+
+- [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
+  — tier classification + status section. Read both **session 2**
+  and **session 1** "Status (post-execution)" sub-sections. The
+  Tier-3 list (line ~342) is the candidate pool for further
+  reframes.
+- [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
+  — runner conventions, the now-50-spec inventory, primitives in
+  `lib/`, isolation defaults, the CDP-gate workaround, the
+  `seedFromHost` reference.
+- [`docs/testing/cases/README.md`](cases/README.md) — case-doc
+  structure and the four anchor scopes.
+- [`tools/test-harness/src/lib/`](../../tools/test-harness/src/lib/)
+  — the existing primitives. Notable additions since session 2:
+  - `claudeai.ts` — `installShowItemInFolderMock` /
+    `getShowItemInFolderCalls` (mirrors `installOpenDialogMock`).
+    If 3+ tests start using mock-then-call, consider extracting to
+    `lib/electron-mocks.ts` — but don't pre-extract.
+- [`tools/test-harness/src/runners/`](../../tools/test-harness/src/runners/)
+  — every existing spec is a template. Notable session 2 templates:
+  - `T16_code_tab_loads.spec.ts` / `T26_routines_page_renders.spec.ts`
+    — seedFromHost + post-login renderer-side AX nav. T16 uses the
+    existing `CodeTab.activate()`; T26 inlines a similar AX walker
+    for the sidebar. Pattern for any further "click an AX-tree
+    button after login" test.
+  - `T25_show_item_in_folder_no_throw.spec.ts` — mock-then-call
+    pattern (mirrors T17). Use this shape for any future Tier 2
+    reframe of a side-effecting Electron API.
+  - `T38_open_in_editor_handler_registered.spec.ts` — IPC handler
+    registry introspection via `ipcMain._invokeHandlers`. Pattern
+    for any "is this handler wired up" check.
+  - `T23_notification_reaches_dbus.spec.ts` — dbus-monitor
+    subprocess + inspector-fired notification + buffer scan.
+  - `T10_cowork_daemon_respawn.spec.ts` — H04 extension: spawn,
+    SIGKILL, poll for new pid. Pattern for any "service auto-respawn
+    contract" test.
+  - `S25_safestorage_token_persists.spec.ts` — two-launch with
+    shared isolation handle + safeStorage round-trip via tmpfile.
+  - `S28_worktree_permission_classifier.spec.ts` — single-regex
+    asar fingerprint for a multi-string-OR classifier expression.
+- [`docs/testing/cases/*.md`](cases/) — the spec each runner
+  asserts. The **Code anchors:** field tells you exactly where
+  upstream implements the feature.
+
+### Tests in scope this session
+
+Five categories, in priority order:
+
+| # | Tests | Source files | Notes |
+|---|---|---|---|
+| **A** Deferred from session 2 | T31, T32, S06, S11, S14 | `code-tab-workflow.md` (T31, T32), `shortcuts-and-input.md` (S06, S11, S14) | T31/T32 need a Code-tab session OPEN; S06 needs Wayland row; S11/S14 need a new focus-shifter primitive |
+| **B** Tier 3 → Tier 2 reframes (read-only) | T22, T35, T37 | `code-tab-workflow.md` (T22), `extensibility.md` (T35, T37) | Each can ship as a *reads-from-disk-or-IPC-registry* probe without writing to the user's account |
+| **C** Asar fingerprint cleanups | T24, T30, T33 | `code-tab-handoff.md` (T24), `code-tab-workflow.md` (T30), `extensibility.md` (T33) | Each has a load-bearing string set in `index.js` that pins the wiring without needing a launch |
+| **D** New primitive — focus-shifter | (unblocks A's S11/S14) | `lib/input.ts` | xdotool / ydotool focus-stealing helper. Build only if S11/S14 worth shipping this session |
+| **E** Mock-then-call extension | T24 (mock form) | `code-tab-handoff.md` (T24) | Mirror of T25's pattern but for `shell.openExternal` — handler reaches the egress with the right URL |
+
+Realistic ceiling: **~6-8 new specs** this session. Don't try all
+13 — Categories A and B are heavier than session 2's mix because
+they need either a Code-tab session opened OR a new primitive built
+first.
+
+### Detailed scope per category
+
+#### Category A — deferred items (5)
+
+- **T31 — Side chat opens.** Needs: `seedFromHost` + Code-tab
+  session OPEN (env pill → Local → choose folder → wait for session
+  load). After session loads, send `Ctrl+;` via ydotool OR find the
+  IPC handler `startSideChat` in `ipcMain._invokeHandlers` (T38
+  pattern) and assert it's registered + invokable. The lighter form
+  is the IPC-registry probe; the heavier form is full
+  click-chain-into-side-chat.
+- **T32 — Slash command menu.** Needs: same Code-tab session OPEN
+  preamble as T31. Then trigger `/` in the prompt textarea and
+  assert the slash menu renders (AX-tree query for menuitem* nodes
+  in the prompt area). Heavier than T31 because the slash menu is
+  rendered server-side by claude.ai's bundle.
+- **S06 — URL handler segfault on native Wayland.** Needs:
+  `CLAUDE_HARNESS_USE_WAYLAND=1` row + `coredumpctl info
+  claude-desktop` observation after firing
+  `xdg-open 'claude://chat/new'`. Skip cleanly if not on a
+  Wayland row.
+- **S11 / S14 — focus-shifter delivery.** Needs: `lib/input.ts`
+  with `focusOtherWindow()` (xdotool on X11; skip on Wayland or
+  use compositor-specific). Then S11 / S14 launch app, focus
+  another window, fire shortcut, assert popup appears. Build the
+  primitive in one PR (Category D), then both runners in a second
+  PR.
+
+#### Category B — Tier 3 → Tier 2 reframes (3)
+
+These each have a slice that doesn't write to the user's real
+account:
+
+- **T22 — PR monitoring (read-only half).** The Tier 3 form opens a
+  PR; the Tier 2 reframe is "after `seedFromHost`, IPC handler
+  `getPrChecks` is registered + the `gh CLI not found in PATH`
+  string is in the bundle". The handler-registered probe is the
+  shippable form; the missing-`gh` warning string is a static
+  fingerprint. Both ship as one runner.
+- **T35 — MCP server config picked up.** Reframe: place a fixture
+  `claude_desktop_config.json` under the isolation's configDir
+  (no host config touch needed — fresh isolation), then via
+  inspector eval, read whatever main-process state holds the
+  parsed MCP server list. Anchor on a known path under
+  `${configDir}/Claude/`.
+- **T37 — `CLAUDE.md` memory loads.** Reframe: place a fixture
+  `~/.claude/CLAUDE.md` (or under `CLAUDE_CONFIG_DIR/CLAUDE.md`
+  with extraEnv override — see S19's pattern), then via inspector
+  eval read the loaded memory state. Anchor needs to come from
+  case-doc Code anchors.
+
+#### Category C — asar fingerprint cleanups (3)
+
+Each is Tier 1 / no launch:
+
+- **T24 — Open in external editor (asar fingerprint).** The full
+  click-chain T24 is Tier 3. The fingerprint half: assert `Mtt`
+  registry is in `index.js` with the editor scheme strings
+  (`vscode://`, `cursor://`, `zed://`, `windsurf://`).
+- **T30 — Auto-archive on PR merge (cadence constants).** Static
+  fingerprint of the sweep cadence — assert `300_000` (5 min)
+  and `3_600_000` (1 h) appear near the auto-archive code.
+- **T33 — Plugin browser (IPC handler registered).** Same shape
+  as T38 — assert `listMarketplaces` IPC handler is registered.
+
+#### Category D — primitive build
+
+- **`lib/input.ts:focusOtherWindow()`.** xdotool on X11
+  (`xdotool search --name '<test-marker>' windowfocus`); on
+  Wayland skip cleanly (no portable focus injection). Used by
+  S11 / S14. Don't build unless those are in scope this session.
+
+#### Category E — mock-then-call extension
+
+- **T24 (mock form, alternative to Category C).** Mock
+  `shell.openExternal` via a new `installOpenExternalMock` in
+  `lib/claudeai.ts` (mirror of `installShowItemInFolderMock`),
+  then `inspector.evalInMain` calls
+  `shell.openExternal('vscode://file/tmp/test')` and assert
+  the recorded call list contains the URL. Strictly stronger
+  than the Category C fingerprint form. **Pick C OR E for T24
+  — not both.**
+
+### Why this iteration
+
+The harness is at 50/76 coverage and every release tag now
+exercises the smoke-set + a chunk of critical surfaces
+automatically. Remaining work clusters in three pockets:
+
+- **Code-tab cluster (T15-T39, mostly login-walled).** Session 2
+  unblocked the *render-only* half via `seedFromHost`; session 3
+  should push into the *open-a-session* half (T31, T32) and the
+  read-only-Tier-3-reframes (T22, T35, T37).
+- **Wayland-specific tests (S06).** Need a Wayland row + the
+  harness's `CLAUDE_HARNESS_USE_WAYLAND=1` switch.
+- **Focus-shift-dependent tests (S11, S14).** Need
+  `lib/input.ts:focusOtherWindow()` built first.
+
+After this session, future sessions can focus on the genuinely
+heavy Tier 3 work (destructive-write login tests; multi-launch
+state) with a clearer cost model.
+
+### Known mechanism-recipe table (session 1 + session 2)
+
+| Pattern | Use when | Worked example |
+|---|---|---|
+| `createIsolation({ seedFromHost: true })` | spec needs a signed-in renderer; read-only | T07, T16, T26 |
+| `isolation: null` + pre-launch `killHostClaude()` | spec needs SingletonLock collision (delivery probes) | T05 |
+| Default isolation | most other tests | T01, T03, T04, S29 |
+| `isolation: <handle>` (shared across launches) | multi-launch persistent state | S35, S25 |
+| `MainWindow.setState('close')` | exercise the wrapper close-interceptor | T08 |
+| Mock-then-call for `shell.*` / `dialog.*` | Tier 2 reframe of side-effecting API | T17, T25 |
+| `ipcMain._invokeHandlers` registry probe | "is this IPC handler wired up" | T38 |
+| `dbus-monitor` subprocess | observe DBus method calls TO a destination | T23 |
+| Asar single-regex multi-string-OR fingerprint | classifier-style code that combines several strings | S28 |
+
+### Constraints to respect (don't violate)
+
+These are unchanged from sessions 1 and 2 and still load-bearing:
+
+- **Default isolation** unless the spec needs otherwise. Never
+  write to `~/.config/Claude` without explicit gating
+  (`CLAUDE_TEST_USE_HOST_CONFIG=1` opt-out, OR `seedFromHost: true`
+  with read-only-then-discard semantics, OR an explicit comment
+  documenting why).
+- **CDP auth gate is alive** — runtime SIGUSR1 attach via
+  `app.attachInspector()`, never Playwright's `_electron.launch()`
+  or `chromium.connectOverCDP()`.
+- **BrowserWindow Proxy gotcha** — use `webContents.getAllWebContents()`
+  not `BrowserWindow.getAllWindows()`. Constructor-level wraps
+  don't work; use prototype-method hooks.
+- **`skipUnlessRow()` always first.** First line of every `test()`
+  body when the test is row-gated.
+- **No fixed sleeps.** `retryUntil` from `lib/retry.ts`, or
+  Playwright auto-wait. Fixed `sleep(N)` is a smell.
+- **Diagnostics on every run.** `testInfo.attach()` the artefacts
+  (launcher log, --doctor output, frame extents, click attempts,
+  AX-tree snapshot). Captured as JSON dumps for multi-state tests.
+- **Tag with annotations.** `severity:` and `surface:` on every
+  test so JUnit carries them through to matrix-regen.
+- **Tabs in TS, ~80-char wrap as the existing files do.** Match
+  surrounding style.
+- **Don't break existing runners.** `npm run typecheck` must stay
+  clean. H01-H05 are the canaries; `npm test` must still pass them
+  after every commit.
+- **For mock-then-call: leading comment must document why mock
+  beats invoke** (T25's leading comment is the worked example —
+  three short paragraphs).
+
+### Phases
+
+#### Phase 0 — calibration
+
+1. `cd tools/test-harness && npm run typecheck` — should pass.
+2. Read the plan doc's "Status (post-execution)" session 2 section,
+   then read T25's leading comment (the mock-then-call pattern's
+   worked-example doc) and T16/T26 (the seedFromHost-then-AX-nav
+   pattern). Confirm you understand both.
+3. Pick one Category B candidate (suggest T22 — read-only half) and
+   sketch the runner shape mentally. Don't write it yet — confirm
+   you can plan from the spec.
+
+If Phase 0 surfaces a problem (typecheck failing, primitives
+unclear, patterns not understood), stop and report. Don't fan out.
+
+#### Phase 1 — fan-out batch
+
+Spawn parallel subagents (cap at 6 in flight) for the highest-
+confidence candidates first.
+
+**Suggested initial batch (4-5 specs):**
+
+- **B / T22 — PR monitoring read-only half.** seedFromHost +
+  `ipcMain._invokeHandlers` for `getPrChecks` + asar fingerprint
+  for the missing-`gh` warning string.
+- **C / T24 (asar fingerprint OR mock form, pick one).** If you
+  want stronger coverage, go mock form (Category E shape — mirror
+  of T25's `installOpenExternalMock` helper). If you want a quick
+  Tier 1, go asar fingerprint (`Mtt` registry + scheme strings).
+- **C / T30 — Auto-archive cadence constants.** Pure asar probe.
+- **C / T33 — Plugin browser handler registered.** T38 pattern.
+- **B / T35 — MCP server config picked up.** Fixture under
+  isolation configDir + inspector eval.
+
+If those land cleanly, dispatch a second batch:
+
+- **A / T31 — Side chat opens (handler-registered shape).**
+  seedFromHost + `ipcMain._invokeHandlers` for `startSideChat`
+  / `sendSideChatMessage` / `stopSideChat`. The lighter probe.
+- **A / T32 — Slash command menu (asar fingerprint).** The full
+  AX-tree form needs a Code-tab session open AND server-side
+  rendered menu — heavy. The fingerprint form: `getSupportedCommands`
+  + `slashCommands` schema present in `index.js`.
+- **A / S11 / S14 (only if Category D primitive is built first).**
+  Build `lib/input.ts:focusOtherWindow()` in PR 1; ship S11/S14
+  in PR 2.
+- **B / T37 — CLAUDE.md memory loads.** Fixture file + inspector
+  eval against the loaded memory state.
+
+#### Per-subagent prompt shape
+
+```
+You're implementing ONE test-harness runner for <TEST-ID> in
+docs/testing/cases/<FILE>.md.
+
+Read in order:
+- docs/testing/cases/<FILE>.md (focus on <TEST-ID>'s Code anchors)
+- tools/test-harness/README.md (conventions; status section names
+  the most-recent-template that fits)
+- tools/test-harness/src/runners/<closest-template>.spec.ts
+- tools/test-harness/src/lib/ (the primitives you'll reuse)
+- CLAUDE.md (project conventions)
+
+Write tools/test-harness/src/runners/<TEST-ID>_short_name.spec.ts.
+
+[per-test specifics: pattern (seedFromHost / mock-then-call /
+ipcMain._invokeHandlers / asar fingerprint / shared isolation),
+assertion shape, skip rules, key constraint warnings]
+
+Constraints:
+- Tabs, ~80-char wrap.
+- Use lib/* primitives; don't reinvent.
+- testInfo.attach() the diagnostics from the spec's "Diagnostics
+  on failure" block.
+- Tag with severity + surface annotations.
+- No fixed sleeps. retryUntil or Playwright auto-wait.
+- npm run typecheck must stay clean after your edits.
+- Don't commit. The user reviews and commits.
+
+If the test isn't reasonable to implement (anchors don't resolve
+to anything assertable, the test depends on state you can't
+construct, the existing primitives don't cover the surface), DO
+NOT write a stub. Report under Open questions and stop. Sessions
+1 and 2 had cumulative ~6 "stop and report" outcomes that were
+the right call (S20 deferral, T05 reshape, T07 needs seedFromHost,
+T08 needs setState('close'), S28 reclassification, T38 framing).
+
+Report shape (~150 words):
+## <TEST-ID> runner
+
+- File written: tools/test-harness/src/runners/<filename>.spec.ts
+- Layer: file probe | argv probe | L1 | L2 (xprop) | L2 (DBus) | pgrep
+- Assertion shape: <one sentence>
+- Skip rules: <which rows + why>
+- Verification path: <typecheck + run result>
+- Open questions: <caveats>
+```
+
+#### Phase 2 — synthesis
+
+After fan-out returns:
+
+1. `cd tools/test-harness && npm run typecheck` — must stay clean.
+2. Run the new runners against KDE-W (the dev box) — but flag the
+   user first if any are destructive (seedFromHost kills running
+   Claude; T31/T32 require an open Code-tab session that may
+   accumulate state). Capture pass/skip/fail per spec for the
+   matrix.
+3. Update [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
+   "Status (post-execution)" section to reflect newly-shipped
+   specs and any reclassifications discovered mid-flight.
+4. Update [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
+   inventory table.
+5. Write a final report listing:
+   - Specs landed (pass / skip / needs-tuning per row)
+   - Specs deferred (with the per-test rationale)
+   - Specs reclassified (Tier 3 → Tier 2, Tier 2 → blocked, etc.)
+   - Updated coverage stat (was 50/76 = 66%, now N/76 = M%)
+6. Don't commit. The user reviews and commits.
+7. Rotate this prompt: rewrite
+   `docs/testing/runner-implementation-followup-prompt.md` for the
+   NEXT session's deferred items.
+
+### Self-correction loop
+
+Same as sessions 1 and 2:
+
+1. Subagent typecheck failure → re-spawn with explicit fix
+   instruction.
+2. Subagent claims a runner exists but `git status` shows no new
+   file → re-spawn with explicit "use the Write tool" instruction.
+3. Two subagents wrote runners that share a primitive but with
+   different shapes → factor into `lib/<topic>.ts` BEFORE shipping.
+
+Cap re-spawns at 2 per file. Past that, mark as needing human
+review and move on.
+
+### Termination conditions
+
+Stop and write the final report when one of:
+
+1. **All Category A + B + C target specs landed and typecheck-clean.**
+   Write coverage update, stop.
+2. **Hit re-spawn cap on 3+ runners.** Stop, write up which are
+   blocked.
+3. **Discovered a primitive gap that breaks 5+ Tier 2/Tier 3
+   tests.** Stop, propose where the new primitive should live in
+   `lib/`. Future session adds the primitive first, then resumes.
+4. **Session budget hits ~7 new specs.** Stop, synthesize, leave
+   the rest for the next session.
+
+### What you should NOT do
+
+- **Don't try to land Category A + B + C in one batch.** That's
+  ~9-11 specs. Pick the highest-confidence subset for the first
+  batch and decide whether to do more based on what came back.
+- **Don't ship stubs.** If a runner can't actually assert what the
+  spec says, mark it as Tier 3 / blocked / primitive-gap and don't
+  write a placeholder. The cumulative six "stop and report"
+  outcomes from sessions 1+2 were the right call — every one
+  revealed a real constraint.
+- **Don't break existing runners.** H01-H05 are the canaries.
+- **Don't pre-extract `lib/electron-mocks.ts`.** The
+  `installShowItemInFolderMock` + `installOpenDialogMock` pair
+  doesn't yet justify a new file; if T24 ships as Category E
+  (mock form), THAT's the third — extract then.
+- **Don't restructure `lib/`** beyond targeted additions.
+  Premature abstractions are wrong abstractions.
+- **Don't run destructive Tier 3 tests** that write to the user's
+  real claude.ai account (T22 PR write, T27 scheduling, T29
+  worktree creation, T34 OAuth, T36 hooks). Only the *read-only
+  reframes* of those are in scope this session.
+- **Don't implement the #569 power-inhibit patch in this session.**
+  That's a separate workstream. The S20 spec follows the patch,
+  not the other way around.
+- **Don't commit.** The user reviews and commits.
+
+### Final report format
+
+```markdown
+## Runner implementation summary (session 3)
+
+- Category A landed: N / 5
+- Category B landed: N / 3
+- Category C landed: N / 3
+- Reclassified mid-flight: N (with reasons)
+- Coverage: was 50/76 (66%), now <NEW>/76 (<PCT>%)
+- Typecheck: clean | <errors>
+- KDE-W test run: <pass/skip/fail counts>
+
+## Per-spec breakdown
+
+| Cat | Test ID | File | Assertion shape | Status |
+|---|---|---|---|---|
+| B | T22 | T22_pr_monitoring_handler.spec.ts | seedFromHost + IPC handler probe + asar fingerprint | ✓ pass |
+| ... |
+
+## Notable findings
+- ...
+
+## Open questions
+- ...
+
+## Files touched
+git status output (only tools/test-harness/src/runners/*.spec.ts +
+maybe lib/* primitives if extraction was needed; possibly plan-doc /
+README updates).
+
+## Diff summary
+git diff --stat
+```
+
+### Operational notes
+
+- Subagents are launched in parallel via a single message with
+  multiple Agent tool calls. Don't serialise.
+- Each subagent's Write calls land directly in the working tree.
+- The grounding probe (`tools/test-harness/grounding-probe.ts`) can
+  help when implementing a runner that asserts runtime API state —
+  capture once with `npm run grounding-probe -- --launch
+  --include-synthetic`, grep the output for the IPC channel /
+  accelerator / API your runner needs to assert against.
+- For seedFromHost specs, the host MUST have a signed-in Claude
+  Desktop. The primitive throws with a clear message if not.
+  Document the prerequisite in your runner's leading comment if
+  it's the first one to add seedFromHost coverage to a new surface.
+- For tests that touch the AX tree, `claudeai.ts` page-objects are
+  the right substrate — see `T17_folder_picker.spec.ts` for the
+  end-to-end example. Don't query DOM by CSS selector unless
+  `claudeai.ts` doesn't already cover the surface.
+- For mock-then-call: see T25 for the canonical pattern. Mock
+  installation is in `lib/claudeai.ts` alongside the dialog-mock;
+  add a sibling export, don't pre-extract a new file.
+
+Begin with Phase 0. Don't fan out until calibration succeeds.
--- a/docs/testing/runner-implementation-plan.md
+++ b/docs/testing/runner-implementation-plan.md
@@ -18,11 +18,60 @@ work begins.

 ## Status (post-execution)

-**Shipped this session (25 new specs):** T02, T05, T06, T07, T08, T09,
+**Shipped session 2 (10 new specs):** T10, T16, T23, T25, T26, T38, S10,
+S19, S25, S28. Coverage moved from 40/76 (53%) to 50/76 (66%).
+
+Session 2 reclassifications:
+
+- **S28** reclassified from **Tier 2 → Tier 1**. The plan called for
+  inspector-eval against `Sbn()` permission classifier with a synthetic
+  error string, but `Sbn` is a closure-local in the bundled main process
+  — not reachable from `globalThis` and no IPC surface exposes it. The
+  shipped runner is a single-regex asar fingerprint that pins the same
+  classifier expression (the `"Permission denied" || "Access is denied"
+  || "could not lock config file" → "permission-denied"` chain) plus the
+  `Failed to create git worktree:` log line. Same drift signal, no
+  launch needed.
+- **T38** shipped as a **handler-registered probe**, not a no-throw
+  invocation. Calling `LocalSessions.openInEditor` directly would
+  terminate at `shell.openExternal('vscode://...')` (real editor launch
+  + side effect on host) and would also be blocked by the channel's
+  origin validation against non-claude.ai senders. Instead, the runner
+  inspects `ipcMain._invokeHandlers` for the channel ending in
+  `LocalSessions_$_openInEditor`. Documents the channel's
+  `$eipc_message$_<UUID>_$_claude.web_$_<name>` framing (UUID is
+  build-stable: `c0eed8c9-...`) so a future framing change is visible.
+- **T23 tool choice — dbus-monitor, not gdbus monitor.** The plan
+  suggested `gdbus monitor --session --dest=org.freedesktop.Notifications`
+  but `gdbus monitor --dest <name>` only sees signals OWNED BY that
+  destination, not method calls TO it. `Notify` is a method call FROM
+  Electron TO the daemon, so `gdbus` cannot observe it. Switched to
+  `dbus-monitor` (eavesdrop match rule). Pre-launch checks gate on
+  `dbus-monitor` presence + notification-daemon ownership of
+  `org.freedesktop.Notifications`.
+- **S19 honest-stub note.** The Tier 2 slice is env propagation
+  (`extraEnv: { CLAUDE_CONFIG_DIR }` → main-process `process.env`).
+  Half 1 (env propagation) is the load-bearing assertion. Half 2
+  (resolver-shape echo) is a synthetic re-implementation of `cE()` /
+  `Tce()` because those minified symbols are closure-locals — leading
+  comment is explicit about the re-implementation status.
+- **T25 / T38 / T23 host side effects documented.** T25 may briefly
+  open a file manager on the host; T38 deliberately doesn't invoke;
+  T23 fires a real notification. Each runner's leading comment
+  documents the side effect.
+
+Tier 2 → Tier 2 candidates remaining for a future session: **T31, T32**
+(side chat, slash menu — both need a Code-tab session OPEN, not just
+the Code tab loaded). **S11, S14** (focus-shifter primitive still
+unbuilt — flagged in plan and unchanged).
+
+---
+
+**Shipped session 1 (25 new specs):** T02, T05, T06, T07, T08, T09,
 T11, T12, T13, T14a, T14b, S01, S02, S03, S04, S05, S07, S08, S15, S16,
 S17, S21, S22, S26, S27. Coverage moved from 15/76 (20%) to 40/76 (53%).

-Reclassifications discovered during execution:
+Session 1 reclassifications:

 - **T05** shipped as a **Tier 3 delivery probe** (xdg-open →
  `app.on('second-instance', ...)`), not the originally-planned Tier 2
--- a/docs/testing/runner-implementation-prompt.md
+++ b/docs/testing/runner-implementation-prompt.md
@@ -1,376 +0,0 @@
-# test-harness runner implementation — implementation prompt
-
-This file is meant to be **copied verbatim into a fresh Claude Code
-session** as the initial user message. Don't paraphrase it; the
-orchestration depends on the exact directives below.
-
---
-
-## Prompt to paste
-
-You're picking up after the cases-grounding sweep, the
-`grounding-probe.ts` runtime probe, and the `CLAUDE_HARNESS_USE_WAYLAND`
-flag all landed. The case docs are now anchored to upstream code +
-wrapper scripts and the harness can swap backends with one env var.
-The next workstream is **wiring runners for the 61 of 76 tests that
-still don't have one**.
-
-Today the harness has 15 specs covering 4 of 39 cross-env tests (T01,
-T03, T04, T17) and 11 of 37 env-specific tests (S09, S12, S29-S37
-Quick Entry sweep) — plus 5 H-prefix harness self-tests. Everything
-else is human-execution-only. The smoke set in
-[`docs/testing/README.md`](docs/testing/README.md#smoke-set) lists T01,
-T03, T04, T05, T07, T08, T11, T15, T16, T17 as the release gate; only
-4 of those are wired. The rest of the gate goes through a manual
-clicker.
-
-This is too many to land in one session — don't try. Triage first,
-land the cheap probes, get the single-launch wins, defer the rest
-with explicit reasons. Optimise for **broad coverage of cheap
-checks** over deep coverage of one renderer-heavy surface.
-
-### Authoritative reference
-
-Read these in order before fanning out:
-
- `docs/testing/cases/README.md` — case-doc structure and the four
-  anchor scopes (upstream / wrapper / SPA / CLI). Each test you wire
-  needs to either consume an existing anchor or surface a new one.
- `tools/test-harness/README.md` — runner conventions, the 5 distinct
-  shapes of TS code already in `lib/` (xprop / dbus-next / inspector
-  attach / `app.asar` reads / `/proc/$pid/cmdline` reads / pgrep), the
-  isolation defaults, the CDP-gate workaround.
- `tools/test-harness/src/lib/` — the existing primitives. Most new
-  runners are recombinations:
-  - `electron.ts` — `launchClaude()`, `app.attachInspector()`,
-    `app.waitForReady('window'|'mainVisible'|'claudeAi'|'userLoaded')`
-  - `inspector.ts` — `evalInMain()`, `evalInRenderer()`,
-    `getAccessibleTree()`, `clickByBackendNodeId()`
-  - `claudeai.ts` — page-objects against the AX-tree substrate
-  - `quickentry.ts` — shared QE primitives (`installInterceptor`,
-    `openAndWaitReady`)
-  - `sni.ts` — DBus StatusNotifierWatcher attribution by pid
-  - `wm.ts` — xprop wrappers; `findX11WindowByPid`,
-    `getNetFrameExtents`
-  - `argv.ts` — `/proc/$pid/cmdline` flag check
-  - `asar.ts` — in-place `app.asar` content reads (no temp extract)
-  - `row.ts` — `skipUnlessRow()` / `skipOnRow()`
-  - `diagnostics.ts` — launcher log + `--doctor` capture
- `tools/test-harness/src/runners/` — every existing spec is a
-  template. Match the layer (L1 / L2 / file / argv / pid) to the
-  test's assertion shape. Don't reinvent — `T17_folder_picker.spec.ts`
-  for L1 click chains, `T03_tray_icon_present.spec.ts` for DBus,
-  `T04_window_decorations.spec.ts` for xprop, `S33_electron_version_capture.spec.ts`
-  for asar reads, `S12_global_shortcuts_portal_flag.spec.ts` for argv.
- `docs/testing/cases/*.md` — the spec each runner asserts. The
-  **Code anchors:** field tells you exactly where upstream implements
-  the feature.
- `CLAUDE.md` (project root) — code style, attribution, commit format.
-
-### Tests in scope
-
-The 61 missing runners, by source file:
-
-| File | Missing |
-|---|---|
-| `launch.md` | T02, T13, T14 |
-| `tray-and-window-chrome.md` | T07, T08, S08, S13 |
-| `shortcuts-and-input.md` | T05, T06, S06, S07, S10, S11, S14 |
-| `code-tab-foundations.md` | T15, T16, T18, T19, T20 |
-| `code-tab-workflow.md` | T21, T22, T29, T30, T31, T32 |
-| `code-tab-handoff.md` | T23, T24, T25, T34, T38, T39 |
-| `routines.md` | T26, T27, T28, S19, S20, S21 |
-| `extensibility.md` | T11, T33, T35, T36, T37, S27, S28 |
-| `distribution.md` | S01, S02, S03, S04, S05, S15, S16, S26 |
-| `platform-integration.md` | T09, T10, T12, S17, S18, S22, S23, S24, S25 |
-
-### Why this iteration
-
-The matrix today is mostly hand-driven. Each cell update requires a
-human running through the case-doc steps on a VM and writing a status
-into `matrix.md`. That's expensive enough that sweeps happen
-infrequently and regressions sit longer than they should. A wider
-runner net means more cells get refreshed automatically per release,
-the human time goes to the renderer-heavy tests that genuinely need
-it, and the Smoke + Critical gate stops being a manual-clicker
-bottleneck on every tag.
-
-This isn't one-session work — it's the start of a series. Optimise
-for **landing what's tractable in this session and clearly handing
-off the rest** with concrete next-steps per deferred test.
-
-### Constraints to respect (don't violate)
-
- **Default isolation.** Every `launchClaude()` gets a fresh
-  `XDG_CONFIG_HOME` sandbox, cleaned on `close()`. Tests that need a
-  signed-in claude.ai must opt out via `launchClaude({ isolation: null })`
-  AND gate themselves on `CLAUDE_TEST_USE_HOST_CONFIG=1`. Never write
-  to the user's real config without that gate.
- **CDP auth gate is alive.** `_electron.launch()` and
-  `chromium.connectOverCDP()` both inject `--remote-debugging-port`
-  which exits the app. The harness uses `SIGUSR1` runtime-attach via
-  `app.attachInspector()` — same code path as Developer → Enable Main
-  Process Debugger. Don't try to use Playwright's electron launcher.
- **BrowserWindow Proxy gotcha.** `frame-fix-wrapper.js` returns the
-  electron module wrapped in a Proxy. `electron.BrowserWindow.getAllWindows()`
-  returns 0. Use `webContents.getAllWebContents()` instead. Constructor-
-  level wraps don't take — use prototype-method hooks (see
-  `docs/learnings/test-harness-electron-hooks.md`).
- **`skipUnlessRow()` always first.** First line of every `test()`
-  body. JUnit `<skipped>` → matrix `-`, never `✗` for an
-  inapplicable row.
- **No fixed sleeps.** `retryUntil` from `lib/retry.ts`, or
-  Playwright's auto-wait. Fixed `sleep(N)` is a smell.
- **Diagnostics on every run** (Decision 7), not just failures.
-  `testInfo.attach()` the launcher log, `--doctor` output, frame
-  extents, click attempts, etc. Captured as JSON dumps for
-  multi-state tests (S31 pattern).
- **Tag with annotations.** `severity:` and `surface:` annotations on
-  every test so JUnit carries them through to matrix-regen.
- **Tabs in TS, ~80-char wrap as the existing files do.** Match the
-  surrounding style.
- **Don't break existing runners.** `npm run typecheck` must stay
-  clean. Run the full `npm test` against KDE-W if you have access;
-  if not, document the verification path in your report.
-
-### Phases
-
-#### Phase 0 — calibration
-
-1. `cd tools/test-harness && npm run typecheck` — should pass; if
-   not, stop and report.
-2. Read `tools/test-harness/README.md` end-to-end and one full runner
-   (suggest `T04_window_decorations.spec.ts` — small, file probe
-   adjacent, easy to follow). Confirm you understand the spec
-   contract before fanning out.
-3. Pick T02 (doctor exit code) as a calibration runner. Read
-   `docs/testing/cases/launch.md` T02, find the existing anchors in
-   `scripts/doctor.sh`, sketch a runner in your head: spawn
-   `claude-desktop --doctor`, assert `exit code === 0`, attach the
-   stdout. ~30 lines. Don't write it yet — just confirm you can plan
-   the shape from the spec.
-
-If Phase 0 surfaces a problem (typecheck failing, primitives unclear,
-spec contract not understood), stop and report. Don't fan out
-subagents against an unverified workflow.
-
-#### Phase 1 — triage
-
-Spawn ONE subagent (`subagent_type: 'general-purpose'`) that reads
-every case file in scope and produces a **tiered runner-implementation
-plan**. Plan output goes to
-`docs/testing/runner-implementation-plan.md` (new file) with this
-shape:
-
-```markdown
-## Tier 1 — File / spawn / argv probes (~30min-1hr each)
-No app launch needed (or single short-lived spawn). Existing
-primitives suffice.
- T02 — `claude-desktop --doctor` exit-code probe. Reuse: lib/diagnostics.ts.
- T13 — same as T02 but parse the package-format line.
- S01 — DEB control file `Depends:` field empty (read post-build).
- S02 — distro substring matching in launcher (asar/script grep).
- ...
-
-## Tier 2 — Single-launch probes (~2-3hrs each)
-One launchClaude() + inspector attach. Existing primitives suffice.
- T05 — URL handler `claude://` registered. Verify via xdg-mime + spawn.
- T06 — Quick Entry shortcut registered. globalShortcut.isRegistered().
- ...
-
-## Tier 3 — Multi-step or login-required (~4-8hrs each)
-Need claude.ai signed in (CLAUDE_TEST_USE_HOST_CONFIG=1) or multi-
-launch state. Defer to follow-up sessions.
- T15 — sign-in flow. Needs OAuth provider mock or live login.
- T16 — Code tab loads. Needs login.
- ...
-
-## Tier 4 — Out of scope or blocked
- T39 — /desktop is the CLI `claude` binary, not the Electron asar.
- S26 — gated on #567 (autoUpdater no-op patch).
- T11, T33 — plugin install needs real plugin + network.
- ...
-```
-
-The triage subagent's job is **only** to classify each test into a
-tier and record reasoning. No runner code yet.
-
-#### Phase 2 — Tier 1 fan-out
-
-Once the plan is in `docs/testing/runner-implementation-plan.md`, spawn
-**one subagent per Tier 1 test** in parallel. Per-subagent prompt:
-
-```
-You're implementing ONE test-harness runner for <TEST-ID> in
-docs/testing/cases/<FILE>.md.
-
-Read in order:
- docs/testing/cases/<FILE>.md (the spec — focus on the <TEST-ID>
-  section, including its Code anchors)
- tools/test-harness/README.md (conventions)
- tools/test-harness/src/runners/<closest-existing-template>.spec.ts
- CLAUDE.md (project conventions)
-
-Write tools/test-harness/src/runners/<TEST-ID>_short_name.spec.ts.
-Match the closest template. First line of the test body:
-skipUnlessRow(testInfo, ['<rows-this-applies-to>']) per the spec's
-Applies to field.
-
-Constraints:
- Tabs, ~80-char wrap.
- Use lib/* primitives; don't reinvent.
- testInfo.attach() the diagnostics from the spec's Diagnostics on
-  failure block.
- Tag with severity + surface annotations.
- No fixed sleeps. retryUntil or Playwright auto-wait.
- npm run typecheck must stay clean after your edits.
- Don't commit. The user reviews and commits.
-
-If the test isn't reasonable to implement (the case anchors don't
-resolve to anything assertable, the test depends on state you can't
-construct, the existing primitives don't cover the surface), DO NOT
-write a stub. Report under Open questions and stop.
-
-Report shape (~150 words):
-## <TEST-ID> runner
-
- File written: tools/test-harness/src/runners/<filename>.spec.ts
- Layer: file probe | argv probe | L1 | L2 (xprop) | L2 (DBus) | pgrep
- Assertion shape: <what the test asserts in one sentence>
- Skip rules: <which rows are skipped + why>
- Verification path: <how the user verifies this runs cleanly>
- Open questions: <any caveats>
-```
-
-Send Tier 1 subagents in parallel — they're independent, each owns
-one runner file. Cap at ~10 subagents at a time to keep the
-fan-out manageable.
-
-#### Phase 3 — Tier 2 fan-out
-
-Same shape as Phase 2 but for Tier 2 tests. Single-launch probes are
-heavier (each subagent does a full launchClaude() + assertion design)
-so cap at ~6 subagents in parallel.
-
-If a Tier 2 subagent hits a blocker that pushes the test into Tier 3
-mid-implementation, stop the subagent — don't ship a stub. The
-synthesis step will reclassify.
-
-#### Phase 4 — synthesis
-
-Once Tier 1 + Tier 2 land:
-
-1. `cd tools/test-harness && npm run typecheck` — must be clean.
-2. Run the new runners against KDE-W if you have access:
-   `ROW=KDE-W npx playwright test src/runners/T02_*.spec.ts ...` —
-   capture which pass cleanly and which need selector tuning.
-3. Write a final report at the end of the session listing:
-   - Runners landed (pass / skip / needs-tuning per row)
-   - Tier 3 deferred (with the per-test rationale from the plan)
-   - Tier 4 out-of-scope (with reasons)
-   - Updated coverage stat (was 15/76 = 20%, now N/76 = M%)
-4. Don't commit. The user reviews and commits.
-
-### Self-correction loop
-
-After Phase 2 / Phase 3 returns:
-
-1. If a subagent's runner fails typecheck, re-spawn with explicit
-   instruction to read the typecheck error and fix it (often a missing
-   import or stale type from a renamed primitive).
-2. If a subagent claimed a runner exists but `git status` shows no
-   new file, the subagent silently dropped the write — re-spawn with
-   explicit "use the Write tool" instruction.
-3. If two subagents wrote runners that share a primitive but with
-   slightly different shapes, refactor the duplication into
-   `lib/<topic>.ts` BEFORE shipping — duplication compounds across the
-   next 50 runners.
-
-Cap re-spawns at 2 per file. Past that, mark the runner as needing
-human review in the final report and move on.
-
-### Termination conditions
-
-Stop and write the final report when one of:
-
-1. **Tier 1 + Tier 2 all landed and typecheck-clean.** Write the
-   coverage update, stop. Future sessions handle Tier 3.
-2. **Hit re-spawn cap on 3+ runners.** Stop, write up which runners
-   are blocked and what each blocker looks like.
-3. **Discovered a primitive gap that breaks 5+ Tier 2 tests.** Stop,
-   write up the gap, propose where the new primitive should live in
-   `lib/`. Future session adds the primitive first, then resumes
-   Tier 2.
-
-### What you should NOT do
-
- **Don't try to land Tier 3 / Tier 4 in this session.** They're
-  deferred for documented reasons. If you find one that turns out
-  cheaper than the plan estimated, note it and pull it forward to
-  Tier 2 — but don't open new can of worms when there are still
-  Tier 1 wins on the table.
- **Don't ship stubs.** If a runner can't actually assert what the
-  spec says, mark it as Tier 3 in the plan and don't write a placeholder.
- **Don't break existing runners.** `H01-H05` are the canaries.
-  `npm test` must still pass H01-H05 after every Tier 1/2 commit.
- **Don't restructure lib/.** Add primitives only when 2+ runners
-  need them. Premature shared abstractions will be wrong.
- **Don't run the host Claude Desktop in destructive ways.** Tests
-  that need login should gate on `CLAUDE_TEST_USE_HOST_CONFIG=1`.
- **Don't commit.** The user reviews and commits.
-
-### Final report format
-
-```markdown
-## Runner implementation summary
-
- Tier 1 landed: N / M
- Tier 2 landed: N / M
- Tier 3 deferred: N (see plan)
- Tier 4 out-of-scope: N (see plan)
- Coverage: was 15/76 (20%), now <NEW>/76 (<PCT>%)
- Typecheck: clean | <errors>
- KDE-W test run: <pass/skip/fail counts>
-
-## Per-tier breakdown
-
-| Tier | Test ID | File | Assertion shape | Status |
-|---|---|---|---|---|
-| 1 | T02 | T02_doctor_exit_code.spec.ts | spawn + exit code | ✓ pass |
-| 1 | S01 | S01_deb_control_no_depends.spec.ts | file probe | ✓ pass |
-| 2 | T05 | T05_url_handler_claude_scheme.spec.ts | xdg-mime + spawn | ✓ pass |
-| 2 | T06 | T06_quick_entry_shortcut.spec.ts | globalShortcut | ⏳ needs ydotool |
-| ...
-
-## Notable findings
- ...
-
-## Open questions
- ...
-
-## Files touched
-git status output (only tools/test-harness/src/runners/*.spec.ts and
-docs/testing/runner-implementation-plan.md should appear; possibly
-new lib/ primitives if extraction was needed).
-
-## Diff summary
-git diff --stat
-```
-
-### Operational notes
-
- Subagents are launched in parallel via a single message with
-  multiple Agent tool calls. Don't serialise.
- Each subagent's Write calls land directly in the working tree.
-  Each owns one runner file — no merge conflicts.
- The grounding probe (`tools/test-harness/grounding-probe.ts`) can
-  help when implementing a runner that asserts runtime API state —
-  capture once with `npm run grounding-probe -- --launch
-  --include-synthetic`, grep the output for the IPC channel /
-  accelerator / API your runner needs to assert against.
- For tests that touch the AX tree, `claudeai.ts` page-objects are
-  the right substrate — see `T17_folder_picker.spec.ts` for an end-
-  to-end example. Don't query DOM by CSS selector unless `claudeai.ts`
-  doesn't already cover the surface.
-
-Begin with Phase 0. Don't fan out until calibration succeeds.
--- a/tools/test-harness/README.md
+++ b/tools/test-harness/README.md
@@ -7,7 +7,7 @@ architecture, decisions, and rationale.

 ## Status

-Forty specs wired (8 cross-env T-tests, 27 env-specific S-tests, 5
+Fifty specs wired (14 cross-env T-tests, 31 env-specific S-tests, 5
 H-prefix harness self-tests). See
 [`docs/testing/runner-implementation-plan.md`](../../docs/testing/runner-implementation-plan.md)
 for the tiered triage of remaining tests and the per-spec rationale
@@ -24,12 +24,18 @@ behind tier classification.
 | [T07](../../docs/testing/cases/tray-and-window-chrome.md#t07--in-app-topbar) | Five topbar buttons render with non-zero rects (uses `seedFromHost` for hermetic auth) | L1 + DOM |
 | [T08](../../docs/testing/cases/tray-and-window-chrome.md#t08--close-x-hides-to-tray) | `win.close()` fires the wrapper interceptor; window hidden, proc alive | L1 |
 | [T09](../../docs/testing/cases/platform-integration.md#t09--autostart-via-xdg) | `setLoginItemSettings({ openAtLogin })` writes/removes `$XDG_CONFIG_HOME/autostart/claude-desktop.desktop` | L1 + filesystem |
+| [T10](../../docs/testing/cases/platform-integration.md#t10--cowork-integration) | After H04-style spawn detection, `kill -9` the daemon and confirm a *different* pid respawns within ~20s (Patch 6 cooldown + retry) | pgrep delta + spawn delta |
 | [T11](../../docs/testing/cases/extensibility.md#t11--plugin-install) | Plugin-install code path fingerprints present in bundled `index.js` | file probe |
 | [T12](../../docs/testing/cases/platform-integration.md#t12--webgl-warn-only) | `app.getGPUFeatureStatus()` returns a populated object; renderer reached visible | L1 |
 | [T13](../../docs/testing/cases/launch.md#t13--doctor-reports-correct-package-format) | `--doctor` does not false-flag rpm/deb installs as missing-dpkg AppImage | spawn + stdout grep |
 | [T14a](../../docs/testing/cases/launch.md#t14--multi-instance-behavior) | `requestSingleInstanceLock` + `'second-instance'` strings in bundled `index.js` (file probe) | file probe |
 | [T14b](../../docs/testing/cases/launch.md#t14--multi-instance-behavior) | Second invocation under same isolation exits cleanly; primary pid stays alive (runtime probe) | spawn delta + pgrep |
+| [T16](../../docs/testing/cases/code-tab-foundations.md#t16--code-tab-loads) | After `seedFromHost` + `userLoaded`, `CodeTab.activate()` resolves and ≥1 compact pill renders (env pill = Code-body mounted) | L1 + AX-tree |
 | [T17](../../docs/testing/cases/code-tab-foundations.md#t17--folder-picker-opens) | Code df-pill → env pill → Local → Select folder → Open folder triggers `dialog.showOpenDialog` (requires `CLAUDE_TEST_USE_HOST_CONFIG=1`) | L1 |
+| [T23](../../docs/testing/cases/code-tab-handoff.md#t23--desktop-notifications-fire) | Firing `new Notification({title})` from main reaches the session bus's `org.freedesktop.Notifications.Notify` (observed via `dbus-monitor`) | L1 + DBus subprocess |
+| [T25](../../docs/testing/cases/code-tab-handoff.md#t25--show-in-files--file-manager) | After `installShowItemInFolderMock` mirroring T17's dialog-mock pattern, `evalInMain` calls `shell.showItemInFolder(<synthetic path>)`; mock records the call verbatim, no throw — no host side effect | L1 (mocked egress) |
+| [T26](../../docs/testing/cases/routines.md#t26--routines-page-renders) | After `seedFromHost` + `userLoaded`, click "Routines" sidebar AX button; assert "New routine" / "All" / "Calendar" anchor renders | L1 + AX-tree |
+| [T38](../../docs/testing/cases/code-tab-handoff.md#t38--continue-in-ide) | `ipcMain._invokeHandlers` registry contains a channel ending in `LocalSessions_$_openInEditor` (handler-registered probe) | L1 (IPC introspection) |
 | H01 | CDP auth gate exits with code 1 when spawned with `--remote-debugging-port` and no `CLAUDE_CDP_AUTH` token | spawn probe |
 | H02 | `frame-fix-wrapper.js` + `frame-fix-entry.js` injected into `app.asar` (Proxy + main-field reference) | file probe |
 | H03 | Build-pipeline patch fingerprints all present in `app.asar` (KDE gate, frame-fix inject, tray, cowork, claude-code) | file probe |
@@ -43,14 +49,18 @@ behind tier classification.
 | [S07](../../docs/testing/cases/shortcuts-and-input.md#s07--claude_use_waylandvar) | Under `CLAUDE_HARNESS_USE_WAYLAND=1`, spawned Electron has `--ozone-platform=wayland` on argv | argv probe |
 | [S08](../../docs/testing/cases/tray-and-window-chrome.md#s08--tray-icon-doesnt-duplicate-after-nativetheme-update) | `setImage`-based in-place fast-path injected by `tray.sh` (KDE-only, file probe) | file probe |
 | [S09](../../docs/testing/cases/shortcuts-and-input.md#s09--quick-window-patch-runs-only-on-kde-post-406-gate) | KDE-gate string present in bundled `index.js` (patch ran at build) | file probe |
+| [S10](../../docs/testing/cases/shortcuts-and-input.md#s10--quick-entry-popup-is-transparent-no-opaque-square-frame) | KDE-W only — popup runtime `getBackgroundColor() === '#00000000'` after Quick Entry opens (regression-detector against electron#50213 if bundled Electron in 41.0.4-bisect-window) | L1 + ydotool |
 | S12 | `--enable-features=GlobalShortcutsPortal` in Electron argv (GNOME-W only — currently a known-failing regression detector) | argv probe |
 | [S15](../../docs/testing/cases/distribution.md#s15--appimage-extraction---appimage-extract-works-as-documented-fallback) | `--appimage-extract` exits 0; `squashfs-root/AppRun --version` runs without FUSE error | spawn + filesystem |
 | [S16](../../docs/testing/cases/distribution.md#s16--appimage-mount-cleans-up-on-app-exit) | `mount(8)` shows new `.mount_claude` while app is up; gone within 10s of close | mount delta |
 | [S17](../../docs/testing/cases/platform-integration.md#s17--app-launched-from-desktop-inherits-shell-path) | Shell-path-worker overlays user's login-shell PATH onto a deliberately-scrubbed env | L1 + utilityProcess |
+| [S19](../../docs/testing/cases/routines.md#s19--claude_config_dir-redirects-scheduled-task-storage) | `extraEnv: { CLAUDE_CONFIG_DIR }` reaches main-process `process.env`; `cE()`-equivalent resolves under the override path | L1 + extraEnv |
 | [S21](../../docs/testing/cases/routines.md#s21--lid-close-still-suspends-per-os-policy) | No `handle-lid-switch` / `HandleLidSwitch` strings in bundle (lid policy deferred to OS) | asar absence probe |
 | [S22](../../docs/testing/cases/platform-integration.md#s22--computer-use-toggle-absent-or-visibly-disabled-on-linux) | `new Set(["darwin","win32"])` platform gate present; no 2-element Set pairing linux (file-probe form) | asar regex |
+| [S25](../../docs/testing/cases/platform-integration.md#s25--mobile-pairing-survives-linux-session-restart) | `safeStorage.encryptString → file → app restart → file → safeStorage.decryptString` round-trips the same plaintext (skips when `isEncryptionAvailable === false`) | L1 + shared isolation handle |
 | [S26](../../docs/testing/cases/distribution.md#s26--auto-update-is-disabled-when-installed-via-aptdnf) | `setFeedURL` present + project suppression marker present (currently fails — gated on #567) | asar fingerprint |
 | [S27](../../docs/testing/cases/extensibility.md#s27--plugins-install-per-user) | `installed_plugins.json` + homedir resolver present; no `*/plugins` system paths in bundle | asar fingerprint |
+| [S28](../../docs/testing/cases/extensibility.md#s28--worktree-creation-surfaces-clear-error-on-read-only-mounts) | Bundled `index.js` contains the worktree permission classifier expression (`"Permission denied" \|\| "Access is denied" \|\| "could not lock config file" → "permission-denied"`) plus the `Failed to create git worktree:` log line | asar fingerprint |
 | [S29](../../docs/testing/cases/shortcuts-and-input.md#s29--quick-entry-popup-is-created-lazily-on-first-shortcut-press-closed-to-tray-sanity) | Popup opens when main is hidden-to-tray (lazy-create sanity) | L1 |
 | [S30](../../docs/testing/cases/shortcuts-and-input.md#s30--quick-entry-shortcut-becomes-a-no-op-after-full-app-exit) | No new claude-desktop pid spawns after post-exit shortcut press | pgrep delta + ydotool |
 | [S31](../../docs/testing/cases/shortcuts-and-input.md#s31--quick-entry-submit-makes-the-new-chat-reachable-from-any-main-window-state) | Submit reaches new chat from visible / minimized / hidden-to-tray (QE-7/8/9) | L1 + ydotool |
@@ -62,15 +72,19 @@ behind tier classification.
 | S37 | Main-window destroy unreachable on Linux per close-to-tray override — documented skip | — |

 These specs exercise the substrate primitives in `lib/`: `xprop`
-shell-outs (T01, T04), `dbus-next` (T03), Node-inspector runtime-attach
-(T07/T17/S29-S35/T05-T14b L1 specs), `app.asar` content reads
-(S08/S09/S21/S22/S26/S27/T11/T14a/H02/H03/S33), `/proc/$pid/cmdline`
-reads (S07/S12), pgrep-based pid deltas (T14b/H04/S16/S30), `mount(8)`
-parsing (S16), source-tree probes against `scripts/launcher-common.sh`
-(S02), `dpkg-query` / `rpm -qR` / `rpm -qf` calls (S03/S04/S05/T13),
-and the new `createIsolation({ seedFromHost: true })` primitive that
-lets login-required tests run hermetically against a copy of the
-host's signed-in auth state (T07).
+shell-outs (T01, T04), `dbus-next` (T03), `dbus-monitor` subprocess
+eavesdrop (T23), Node-inspector runtime-attach
+(T07/T16/T17/T26/S10/S29-S35/T05-T14b L1 specs), `app.asar` content reads
+(S08/S09/S21/S22/S26/S27/S28/T11/T14a/H02/H03/S33), `/proc/$pid/cmdline`
+reads (S07/S12), pgrep-based pid deltas (T10/T14b/H04/S16/S30),
+`mount(8)` parsing (S16), source-tree probes against
+`scripts/launcher-common.sh` (S02), `dpkg-query` / `rpm -qR` / `rpm -qf`
+calls (S03/S04/S05/T13), `safeStorage.encryptString` round-trip across
+two launches (S25), `extraEnv` precedence over isolation env (S19),
+`ipcMain._invokeHandlers` registry introspection (T38), and the
+`createIsolation({ seedFromHost: true })` primitive that lets
+login-required tests run hermetically against a copy of the host's
+signed-in auth state (T07, T16, T26).

 Per-row pass/skip counts depend on which sweep runs against the row;
 see `runner-implementation-plan.md` for tier classification and