docs(testing): session 6 plan/inventory + rotate session 7 prompt

Plan-doc Status (post-execution): session 6 section added at top
covering S14 + lib/input-niri.ts ship + the cross-compositor-files-
not-dispatcher reasoning + Category B (eipc-registry exposer)
carrying over to session 7 unattempted.

Untested-on-real-Niri caveats explicitly documented (Ok-wrapper
schema version, Claude app_id literal value, foot-on-PATH) so the
first Niri-row sweep knows what to confirm without re-deriving the
recon.

README inventory updated to 62 specs (24 cross-env T-tests, 33
env-specific S-tests, 5 H-prefix harness self-tests). S14 row added;
lib/input-niri.ts entry added to the substrate-primitives layout
block and to the lib/ paragraph that lists each primitive's
consumer specs.

Followup prompt rewritten for session 7. Main bet now shifts to:

- A: eipc-registry exposer (now the cleanest single-session win
  available — sessions 3-6 each kept punting because lower-risk
  work was on the table; with the obvious focus-shifter / mock-
  then-call substrate work landed, Category A is the only path
  forward to proper Tier 2 runtime probes for T22/T31/T33/T38
  AND unblocks T35 Phase 2 / T37 Phase 2). Three approaches
  documented for the inspector walk: module-level grep for
  registry exposers, hook-the-eipc-registration-site, patch-in-
  a-dev-only-exposer.
- B: T35 Phase 2 / T37 Phase 2 paired with Category A. Skip
  unless A lands first.
- C: Single-spec deferred items audit (S20 still open on #569;
  T34 OAuth round-trip; T36 Phase 2 reclassified out;
  cross-compositor S14 variants speculative without a consumer).

New constraints from session 6 documented in the prompt:

- lib/input-niri.ts stays Niri-only by design — strict
  XDG_CURRENT_DESKTOP === 'niri' gate. Sway / Hyprland / River
  consumers must skip or live in their own per-compositor files.
- Don't speculate on a lib/input-wayland.ts dispatcher.
  Per-compositor files until a second Wayland consumer lands.

Cumulative "stop and report" outcome count bumped to ~13 across
sessions 1-6 (added: session-6 lib/input-niri.ts shipped untested-
on-niri).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
aaddrick
2026-05-03 19:19:45 -04:00
parent 34e9077dd2
commit e038768daa
3 changed files with 320 additions and 277 deletions

View File

@@ -1,19 +1,20 @@
# test-harness runner implementation — session 6 prompt
# test-harness runner implementation — session 7 prompt
This file is meant to be **copied verbatim into a fresh Claude Code
session** as the initial user message. Don't paraphrase it; the
orchestration depends on the exact directives below.
You're picking up after a runner-implementation session that landed 1
new spec (T18) and a load-bearing reclassification finding (T36
Phase 2 is no longer a Tier 2 candidate). Coverage 60/76 (79%) →
61/76 (80%). One commit on `docs/compat-matrix`:
new spec (S14) and 1 new primitive (`lib/input-niri.ts`). Coverage
61/76 (80%) → 62/76 (82%). One commit on `docs/compat-matrix`:
- `XXX` — `test(harness): session 5 runner + SessionStart-fires-on-
prompt finding` (T18 Tier 1 fingerprint pinning the drag-drop
preload bridge in `mainView.js`; plan-doc updated with the
SessionStart-hook trace + Code-tab AX anchor capture + S14 niri
msg recon verdict).
- `XXX` — `test(harness): session 6 runner + niri-native focus-shifter
primitive` (S14 Tier 2 known-failing detector for the Niri portal
`BindShortcuts` path, mirrored from S11's shape with imports swapped
to the new `lib/input-niri.ts` primitive; primitive uses
`niri msg --json windows` / `niri msg action focus-window` /
`niri msg --json focused-window` chain plus `foot --title` for the
marker window).
(Substitute the actual SHA after committing — the user reviews and
commits at the end of every session.)
@@ -22,227 +23,199 @@ The plan doc at
[`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
captures the tier classification and execution-time reclassifications.
Its "Status (post-execution)" section is the source of truth for
what's done and what's deferred — read **session 5** first, then
**session 4**, then **session 3**, then **session 2**, then **session
1** sub-sections.
what's done and what's deferred — read **session 6** first, then
**session 5**, then **session 4**, then **session 3**, then **session
2**, then **session 1** sub-sections.
This session is a continuation, not a restart. Start by reading the
plan doc's status sections.
### Big new findings from session 5
### Big new findings from session 6
1. **SessionStart hook fires after first prompt submission, not on
New-session click.** Trace through bundled `index.js`:
`Ys.startSession` (`:454743` general, `:489371` CCD/Code-tab)
requires `A.message`; the session record stores it as
`initialMessage` (:489270); the agent SDK process is spawned via
`DN({ prompt: k, options: v })` (`:489514`) only when there's a
prompt stream to bind to. `createOrResumeSession` (`:489208`)
creates the session record but doesn't spawn the agent. The
SessionStart hook fires inside the agent SDK process once it
boots — therefore only after a real prompt submission, which is
a real-account write. **T36 Phase 2 reclassified Tier 2 →
Tier 3/4**; unmockable without deep agent-SDK reverse-engineering.
2. **Code-tab session-opener AX surface verified — anchors saved in
plan-doc.** A one-shot AX-tree probe against the user's
debugger-enabled running Claude (deleted after capture) confirmed:
- **Top-tab Code button**: `button[name="Code"]` under
`group[Mode]` under `complementary`. Disambiguator from the
prompt-mode `tab[name="Code"]` in
`tablist[name="Prompt categories"]` (which is what T16's
existing `CodeTab.activate()` clicks).
- **Sidebar entries**: `button[name="New session ⌘N"]`,
`button[name="Routines"]`, `button[name="Customize"]`,
`button[name="More navigation items"]`,
`button[name="Pinned"]` / `button[name="Recents"]`.
- **Recents items**: `button[name="<status> <title>"]` where
status ∈ {Idle, Ready, Needs input, Awaiting input}. Main-pane
Welcome surface uses `button[name="Open session <title>"]`.
- **URL of Code-tab landing**: `/epitaxy`.
No primitive shipped — these anchors live in the plan-doc until a
consumer needs them. Premature abstraction is wrong abstraction.
3. **niri msg IPC contract: `--json` shape is stable.** Wiki
explicitly contracts the JSON output; plain text is unstable.
`niri msg --json windows` returns `Vec<Window>` with `{id, title,
app_id, pid, workspace_id, is_focused, ...}`; `niri msg action
focus-window --id <u64>` injects focus; `niri msg --json
focused-window` is the honest readback. `foot --title <T> -e
sleep 600` is the Wayland-native marker (takes `--title` cleanly,
ships in most niri setups). Niri 25.08+ has opt-in
`xwayland-satellite` integration — existing X11 primitive *might*
work on niri rows where it's running, but can't assume.
4. **T18 Tier 1 fingerprint shipped against `mainView.js`, not
`index.js`.** First runner to read a non-`index.js` source from
the asar. `lib/asar.ts` already supports this via the existing
`readAsarFile(filename, asarPath)` shape — no helper extraction
needed. The case-doc anchor strings (`getPathForFile`, `webUtils`,
`filePickers`, `claudeAppSettings`) are property names that
survive minification verbatim — no minified-vs-beautified gotcha
(unlike T35's `~/.claude.json` → `.claude.json`).
5. **Tier 2/3 OS-level drag-drop is a primitive gap on BOTH
backends.** X11 xdotool can simulate mouse motion but cannot put
file URIs on the XDND selection (Chromium's drop handler would
never see a file payload). Wayland needs per-compositor IPC +
libei input injection. A real test needs either a custom XDND
source app (X11) or a libei emitter (Wayland). The xdotool form
the session-5 prompt suggested for T18 was a stub by this lens —
pivot to Tier 1 was the right call.
1. **`lib/input-niri.ts` shipped against session 5's recon — untested
on real Niri.** The primitive landed against the recon notes
without a live Niri row run. The first real Niri sweep will confirm:
- The `Ok`-wrapper unwrap covers the niri version on the row. The
primitive defensively handles both `{Ok: {FocusedWindow: ...}}`
(older niri) and the bare-payload shape (newer niri); a third
shape would fall through to `null` rather than crash.
- Claude's `app_id` value on niri is literal `'Claude'`. The
primitive's `app_id !== 'Claude'` guard becomes a no-op rather
than wrong if the actual value differs (match still happens by
title); tighten if needed.
- `foot` is on the target row's PATH. Skip path is clean if not
(`FootUnavailable` typed error → `testInfo.skip()` with install
hint).
Verified on KDE-W: the runner skips correctly via the row gate.
2. **S14 is a known-failing detector by design.** Case-doc S14
currently records `Failed to call BindShortcuts (error code 5)` on
Niri. Same shape as S12's GNOME-W
`--enable-features=GlobalShortcutsPortal` detector — the spec
encodes the contract and will start passing on Niri rows once the
upstream / Chromium-side portal issue resolves, without any spec
edit.
3. **Cross-compositor dispatcher deliberately not built.** Sway /
Hyprland / River each have completely different IPCs (`swaymsg`,
`hyprctl`, `riverctl`). Per-compositor files until a second
consumer surfaces — a hypothetical `lib/input-wayland.ts` would
just switch on `XDG_CURRENT_DESKTOP` and delegate. With only S14
consuming `lib/input-niri.ts`, a dispatcher would be ceremony.
Same anti-speculation rule that kept `lib/electron-mocks.ts`
(session 3) and `lib/input.ts` (session 4) threshold-driven.
### Authoritative reference
Read these in order before fanning out:
- [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
— tier classification + status section. Read **session 5**,
**session 4**, **session 3**, **session 2**, then **session 1**
"Status (post-execution)" sub-sections. The Tier-3 list (around
line 690 — search for "## Tier 3") is the candidate pool for
further reframes; T18 has now landed (was Tier 3, shipped Tier 1)
and T36 Phase 2 reclassified to Tier 3/4.
— tier classification + status section. Read **session 6**,
**session 5**, **session 4**, **session 3**, **session 2**, then
**session 1** "Status (post-execution)" sub-sections. The Tier-3
list (search for "## Tier 3") is the candidate pool for further
reframes.
- [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
— runner conventions, the now-61-spec inventory, primitives in
— runner conventions, the now-62-spec inventory, primitives in
`lib/`, isolation defaults, the CDP-gate workaround, the eipc
note.
- [`docs/testing/cases/README.md`](cases/README.md) — case-doc
structure and the four anchor scopes.
- [`tools/test-harness/src/lib/`](../../tools/test-harness/src/lib/)
— the existing primitives. No new primitives in session 5.
Notable: `input.ts` remains strict X11-only by design; do NOT bolt
Wayland into it. If session 6 builds the niri-native sibling, put
it in `lib/input-niri.ts` (per-compositor file, NOT a Wayland
catch-all — sway/hyprland/river have totally different IPCs).
— the existing primitives. Notable session 6 addition:
`input-niri.ts` (Niri-only, `niri msg --json` IPC + `foot` marker;
sibling of X11-only `input.ts`). DO NOT bolt other Wayland
compositors into `input-niri.ts` per-compositor files only.
- [`tools/test-harness/src/runners/`](../../tools/test-harness/src/runners/)
— every existing spec is a template. Notable session 5 templates:
- `T18_drag_drop_files_into_prompt.spec.ts` — first runner to
read a non-`index.js` source (`mainView.js`). Pattern for any
future fingerprint that anchors on the preload bundle (e.g.
bridge wiring, contextBridge exposes).
— every existing spec is a template. Notable session 6 templates:
- `S14_quick_entry_from_other_focus_niri.spec.ts` — first runner
consuming `lib/input-niri.js`. Pattern for any future
Niri-specific runner that needs Wayland-native focus injection.
- [`docs/testing/cases/*.md`](cases/) — the spec each runner
asserts. The **Code anchors:** field tells you exactly where
upstream implements the feature.
### Tests in scope this session
**Realistic ceiling: ~3 new specs OR one new primitive landing.**
Session 5 ran light (1 spec + 1 doc finding) because the runtime
probe + bundled-source trace consumed half the budget. Session 6's
clearest single-session win is **Category A — `lib/input-niri.ts`
+ S14 runner** because:
**Realistic ceiling: ~3 new specs OR one new primitive landing.** The
session 6 work (1 spec + 1 primitive) was at the lower end of the
ceiling because the primitive build was substantial. The obvious
focus-shifter / mock-then-call substrate work is now done — the next
session's main bets are narrower in shape.
- The recon already sketched the primitive API (mirrors
`lib/input.ts`'s shape, swaps xdotool/xprop for `niri msg`).
- The niri IPC contract is stable in `--json` mode per the wiki.
- S14 is the single consumer waiting on it.
- The `lib/input.ts` extraction in session 4 is a direct template.
**Category B (eipc-registry exposer) is now the cleanest single-
session win available.** Sessions 3-6 each kept punting Category B
because (a) other lower-risk work was on the table (focus-shifter,
mock-then-call extraction, Tier 1 fingerprints) and (b) session 3's
inspector walk came up empty. With the obvious work landed, Category
B is the only path forward to proper Tier 2 runtime probes for
T22/T31/T33/T38 (currently shipped as Tier 1 fingerprints) AND
unblocks T35 Phase 2 / T37 Phase 2.
Three categories — pick ONE as the main bet, treat the others as
fallback if the main bet hits an early blocker:
| # | Tests | Source | Notes |
|---|---|---|---|
| **A** `lib/input-niri.ts` + S14 runner | S14 | new `lib/input-niri.ts` + S14 runner | Recon-sketched; niri IPC contract is stable in `--json` mode. Cleanest single-session win. |
| **B** eipc-registry exposer | unblocks T22/T31/T33/T38 Tier 2 reframes | `lib/electron.ts` or new `lib/eipc.ts` | High-risk-high-reward closure-local reverse-engineering. Same warning as sessions 4 / 5: session 3's inspector walk came up empty; needs a fresh approach. |
| **C** Single-spec deferred items audit | various (T35 Phase 2 / T37 Phase 2 still blocked on closure-local readback; T36 Phase 2 NO LONGER A CANDIDATE) | — | Lower ceiling, higher confidence per spec. |
| **A** eipc-registry exposer | unblocks T22/T31/T33/T38 Tier 2 reframes + T35 Phase 2 / T37 Phase 2 | new `lib/eipc.ts` (or extension to `lib/electron.ts`) | High-risk-high-reward closure-local reverse-engineering. Session 3's inspector walk via `globalThis` came up empty; sessions 4/5/6 each skipped for budget. **Now the cleanest single-session win** — needs a fresh approach. |
| **B** T35 Phase 2 / T37 Phase 2 (paired with eipc-registry exposer) | T35 Phase 2, T37 Phase 2 | depends on Category A | Only viable if Category A lands first. Don't attempt without it. |
| **C** Single-spec deferred items audit | various deferred items | — | Lower ceiling, higher confidence per spec. Best fallback if Category A turns up empty. |
#### Category A — `lib/input-niri.ts` + S14 runner
#### Category A — eipc-registry exposer
The session 5 recon's TRACTABLE verdict gives the API sketch
verbatim:
The closure-local IPC registry near `:68820` (`le(i)` origin
validation) and `:68816` (channel framing) is what T22/T31/T33/T38
should be probing at runtime — instead they all ship as Tier 1
asar fingerprints because session 3 confirmed the standard
`ipcMain._invokeHandlers` map only carries three chat-tab MCP-bridge
handlers, not the `LocalSessions_$_*` / `CustomPlugins_$_*` channels.
The custom `$eipc_message$_<UUID>_$_claude.web_$_<name>` protocol
uses a closure-local message-port registry that's not introspectable
from main without reverse-engineering the eipc bootstrap.
- `spawnMarkerWindow(title)` → `child_process.spawn('foot',
['--title', title, '-e', 'sleep', '600'], {detached:true})`;
teardown via PID + SIGTERM. Mirrors the X11 primitive's xterm
pattern.
- `focusOtherWindow(title)` → `niri msg --json windows`,
`JSON.parse`, find row where `title === wantedTitle && app_id !==
'Claude'`, then `niri msg action focus-window --id <id>`, then
re-read `niri msg --json focused-window` and assert `id` matches.
This gives the honest readback that S11's primitive needs.
- `getFocusedWindowId()` → `niri msg --json focused-window` →
`.Ok.FocusedWindow?.id ?? null`.
- `isNiriSession()` → check `XDG_CURRENT_DESKTOP === 'niri'` OR
`niri msg version` exits 0 (the latter is more honest because
XDG_CURRENT_DESKTOP can be overridden — but adds a process-spawn
cost on every call; cache the result).
**Approaches that have NOT been tried (good starting points):**
S14 runner shape: near-clone of `S11_quick_entry_from_other_focus.spec.ts`
with the import swapped from `lib/input.js` to `lib/input-niri.js`
and the row gate flipped from `['GNOME-X', 'Ubu-X']` to `['Niri']`.
The X11-side "what this catches vs what it doesn't" leading
comment from S11 has a Niri-side equivalent: this catches a
regression in the Wayland path of the global shortcut on Niri (the
load-bearing concern the case-doc carries forward from the S11
mutter regression discussion).
1. **Module-level grep for symbol references** — search the bundled
`index.js` near `:68816` and `:68820` for any
`Object.defineProperty` / `globalThis[`...`]` / `module.exports`
call that exposes the registry to a reachable surface.
2. **Hook the eipc message-port creation site** — instead of looking
for a registry to inspect post-hoc, hook the registration site
itself. If the channel-name string flows through a single
function call, install a prototype-method hook at that site (see
the hook pattern in
[`docs/learnings/test-harness-electron-hooks.md`](../../learnings/test-harness-electron-hooks.md))
and accumulate names into a side-channel map the test can read.
3. **Patch in a dev-only registry exposer** — pre-launch, modify
`index.js` (via the harness's `lib/asar.ts` write path) to add
`globalThis.__eipcChannels = ...` near the registration site.
Idempotent + reversible; the patched asar is per-test isolation
so it doesn't leak.
**Cross-compositor consideration (do NOT bolt in this session):**
Sway / Hyprland / River each have totally different IPCs.
Per-compositor files (`lib/input-sway.ts`, `lib/input-hypr.ts`,
…) are cleaner than a unified abstraction. A `lib/input-wayland.ts`
dispatcher would just be a switch on `XDG_CURRENT_DESKTOP` that
delegates. Don't speculate on it this session — let the second
consumer drive the dispatcher.
**STOP AND REPORT** if: (a) `niri msg` output shape doesn't match
the recon (the wiki contract is `--json` only, but the output
schema may shift between niri versions even within the contract);
(b) `foot` isn't on the target row's PATH (the primitive should
fall back to `alacritty` / `kitty` / fail with a clear typed
error matching `lib/input.ts`'s `XdotoolUnavailable` shape).
#### Category B — eipc-registry exposer
Same framing as session 4/5: closure-local reverse-engineering of
the eipc bootstrap near `:68820` (`le(i)` origin validation) and
`:68816` (channel framing). Session 3's inspector walk found
nothing reachable via `globalThis`; the walk was repeated approach
in sessions 4/5 implicitly (and skipped for budget reasons).
If you take this as the main bet, treat as exploratory — Phase 1
is the inspector walk only. STOP AND REPORT if 2-3 distinct
approaches turn up empty. The cleanest "tried, here's what was
unreachable" report converts the primitive-gap annotation in the
plan-doc from "TODO" to "tried, unfixable without an upstream
change." Don't ship a stub.
If Category A turns up empty after 2-3 distinct approaches, STOP
AND REPORT. Don't keep digging — document what was tried, ship a
"H06 documentation runner" that captures the dead-end as a finding
in JUnit, and pivot to Category C. The cleanest "tried, here's
what was unreachable" report converts the primitive-gap annotation
in the plan-doc from "TODO" to "tried, unfixable without an
upstream change."
If a stable handle is found, expose it via `lib/eipc.ts`
(`getEipcChannels`, `invokeEipcChannel`); upgrade T22 / T31 /
T33 / T38 from Tier 1 fingerprints to Tier 2 runtime probes.
T33 / T38 from Tier 1 fingerprints to Tier 2 runtime probes. Cap
at ~3 spec upgrades — don't try to land all four if the first one
surfaces an unexpected issue.
#### Category B — T35 / T37 Phase 2 (paired with Category A)
Both currently ship as Tier 1 fingerprints because the parsed-state
readback target is a closure-local minified symbol — the same
gotcha as S28 from session 2 and S19's `cE()`/`Tce()`
re-implementation note. Without Category A landing first, the
fixture form of these specs would assert "the spec didn't crash"
and nothing more.
Skip this category unless Category A lands a stable handle.
#### Category C — single-spec deferred items audit
Walk through session 1/2/3/4/5 deferrals and identify any that are
now tractable. Specifically:
Walk through session 1-6 deferrals and identify any that are now
tractable. Specifically:
- **S20** — `powerSaveBlocker` Inhibit. Issue #569 still open;
not this session.
this is a separate workstream, not for this session.
- **T18** — drag-drop OS-level form. Tier 1 fingerprint shipped
session 5; OS-level (Tier 2/3) requires a custom XDND source
(X11) or libei emitter (Wayland) — both are heavy primitive
builds that don't fit this session's ceiling.
- **T34** — OAuth round-trip. Hard to mock; not this session
unless you have a clever idea.
- **T35 Phase 2 / T37 Phase 2** — fixture-readback. Same
closure-local target as T37b. Need either Category B
(eipc-registry exposer) to land first, or a different readback
path. Skip unless paired with Category B.
- **T36 Phase 2** — NO LONGER A CANDIDATE. Session 5's
SessionStart-hook trace showed the hook fires only after first
prompt submission, which is a real-account write. Reclassified
- **T35 Phase 2 / T37 Phase 2** — see Category B above. Need
Category A first.
- **T36 Phase 2** — NOT a candidate. Session 5's SessionStart-
hook trace showed the hook fires only after first prompt
submission, which is a real-account write. Reclassified
Tier 2 → Tier 3/4. Don't try to ship it.
- **S14 Wayland variant** — see Category A. Session 5 recon says
TRACTABLE.
- **S14 cross-compositor variants (Sway / Hyprland / River)** —
no current case-doc consumer demands them. Don't speculate.
#### Code-tab session-opener primitive (NOT recommended this session)
If Category A turns up empty, Category C's most-reachable target
is **investigate Tier 3 reframes for issues opened against the
project since session 6.** Check `gh issue list --state open
--label test-coverage-gap` (if the label exists) or just walk
recent open issues for ones that suggest a Tier 1 fingerprint is
now possible (a regression that produces a stable string in the
bundle, etc.).
Session 5 verified the AX surface (anchors in plan-doc), but the
single biggest consumer (T36 Phase 2) was just reclassified out of
Tier 2. Without a load-bearing consumer, building
`CodeTab.activateTopTab()` / `startNewSession()` would be a
speculative primitive. Wait until a real consumer surfaces.
#### Cross-compositor focus-shifter expansion (NOT recommended this session)
Building `lib/input-sway.ts` / `lib/input-hypr.ts` would mirror
`lib/input-niri.ts`'s shape but no consumer is asking for them.
Sway / Hyprland / River specs aren't on the case-doc radar.
Premature abstractions are wrong abstractions. Wait for a real
consumer.
### Constraints to respect (don't violate)
These are unchanged from sessions 1/2/3/4/5 and still load-bearing:
These are unchanged from sessions 1/2/3/4/5/6 and still load-bearing:
- **Default isolation** unless the spec needs otherwise. Use
`seedFromHost: true` for any test that depends on authenticated
@@ -252,19 +225,21 @@ These are unchanged from sessions 1/2/3/4/5 and still load-bearing:
channels.** Session 3 confirmed those use a custom eipc protocol
not in the standard registry. T22/T31/T33/T38 are Tier 1
fingerprints. If you build the eipc-registry exposer (Category
B), update the plan-doc and this prompt accordingly.
A), update the plan-doc and this prompt accordingly.
- **`lib/input.ts` is X11-only.** Strict `XDG_SESSION_TYPE ===
'x11'` gate. Wayland consumers must skip — don't try to bolt
Wayland into the file. Session 6's Category A puts the niri
variant in `lib/input-niri.ts` (sibling), NOT `lib/input.ts`.
Wayland into the file.
- **`lib/input-niri.ts` is Niri-only.** Strict
`XDG_CURRENT_DESKTOP === 'niri'` gate. Sway / Hyprland / River
consumers must skip or live in their own per-compositor files.
- **Don't speculate on `lib/input-wayland.ts` dispatcher.**
Per-compositor files until a second consumer (Sway / Hyprland /
River row) lands. Premature abstractions are wrong abstractions.
Per-compositor files until a second Wayland consumer (Sway /
Hyprland / River) lands. With only S14 on Niri, a dispatcher
is ceremony.
- **Code-tab AX anchors stay in plan-doc until a consumer needs
them.** Don't preemptively add `CodeTab.activateTopTab()` to
`claudeai.ts` — T36 Phase 2 was the only consumer and it's now
Tier 3/4. Session 5's anchors block out the work for whenever
a future consumer surfaces.
`claudeai.ts` — session 5's anchors block out the work for
whenever a future consumer surfaces.
- **CDP auth gate is alive** — runtime SIGUSR1 attach via
`app.attachInspector()`, never Playwright's `_electron.launch()`
or `chromium.connectOverCDP()`.
@@ -277,10 +252,10 @@ These are unchanged from sessions 1/2/3/4/5 and still load-bearing:
- **No fixed sleeps.** `retryUntil` from `lib/retry.ts`, or
Playwright auto-wait. Fixed `sleep(N)` is a smell. (Exception:
short sleeps inside hand-rolled retry loops that catch typed
errors and short-circuit; see S11 for the pattern.)
errors and short-circuit; see S11 / S14 for the pattern.)
- **Diagnostics on every run.** `testInfo.attach()` the artefacts.
Single-shot JSON dumps for multi-state tests (S11, S31 pattern)
are cleaner than 5+ separate attachments.
Single-shot JSON dumps for multi-state tests (S11, S14, S31
pattern) are cleaner than 5+ separate attachments.
- **Tag with annotations.** `severity:` and `surface:` on every
test so JUnit carries them through to matrix-regen.
- **Tabs in TS, ~80-char wrap as the existing files do.** Match
@@ -298,26 +273,22 @@ These are unchanged from sessions 1/2/3/4/5 and still load-bearing:
`lib/electron-mocks.ts`,** not `lib/claudeai.ts`. Documented in
T24/T25's leading comments.
- **Marker windows / sacrificial host processes always die in
`finally`.** S11 is the template — `marker.kill()` runs before
`app.close()` so the kill happens even if the spec throws. The
niri sibling's `foot` marker should follow the same pattern.
`finally`.** S11 / S14 are the templates — `marker.kill()` runs
before `app.close()` so the kill happens even if the spec throws.
### Phases
#### Phase 0 — calibration
1. `cd tools/test-harness && npm run typecheck` — should pass.
2. Read the plan doc's "Status (post-execution)" session 5 section,
then read S11's leading comment + `lib/input.ts`'s leading
comment (the X11-only-row-gate reasoning still applies; the
niri sibling will mirror its shape but with niri-specific
honest-readback discussion). Confirm you understand both.
3. Pick ONE Category as the main bet. Don't write it yet — confirm
you can plan from the spec. For Category A, verify niri's IPC
doc is still consistent with the session 5 recon (the wiki
page may have changed; re-fetch). For Category B, confirm the
closure-local landscape hasn't shifted (re-run the session 3
inspector walk's premise).
2. Read the plan doc's "Status (post-execution)" session 6 section,
then read S14's leading comment + `lib/input-niri.ts`'s leading
comment. Confirm you understand the niri-only gate reasoning.
3. Pick ONE Category as the main bet. For Category A, plan the
approach: (a) module-level grep for registry exposers, (b) hook
the eipc registration site, (c) patch in a dev-only exposer.
List which approaches you'll try in what order, with the cap at
2-3 distinct approaches before STOP AND REPORT.
If Phase 0 surfaces a problem (typecheck failing, primitives
unclear, the chosen Category's prerequisites don't hold), stop and
@@ -325,41 +296,30 @@ report. Don't fan out.
#### Phase 1 — fan-out batch
For Category A (`lib/input-niri.ts` + S14):
- Spawn ONE subagent for `lib/input-niri.ts` against the
recon-sketched API (mirror `lib/input.ts` style — leading
comment with the `--json`-stability rationale and the
honest-readback reasoning, sibling typed errors
`NiriIpcUnavailable` / `FootUnavailable`, exports matching
`focusOtherWindow` / `spawnMarkerWindow` / `getFocusedWindowId`
/ `isNiriSession` / `MarkerWindow` interface).
- Spawn ONE subagent in parallel for the S14 runner (near-clone
of S11 with imports swapped + row gate `['Niri']`).
- After both return: typecheck, ensure the two files agree on the
primitive's exported shape.
For Category B (eipc-registry exposer):
- Spawn ONE subagent doing the inspector walk — looking for
module-level Maps / dispatch functions / `globalThis` writes
near `:68816`-`:68820`. Treat as exploratory; report findings
before committing to a primitive shape.
For Category A (eipc-registry exposer):
- Spawn ONE subagent per approach — module-level grep, hook-at-
registration-site, dev-only patch-in. Treat as exploratory;
report findings before committing to a primitive shape.
- Cap re-spawns at 2-3 distinct approaches; if all empty, STOP
AND REPORT.
AND REPORT. Ship an `H06_eipc_registry_finding.spec.ts`
documentation runner if useful state surfaces during the
investigation.
- If a stable handle is found, second batch: build `lib/eipc.ts`
+ ship `H06_eipc_registry_finding.spec.ts`. Third batch:
upgrade T22 / T31 / T33 / T38.
+ ship the H06 finding runner. Third batch: upgrade T22 / T31 /
T33 / T38.
- Cap at ~3 specs total upgrade — don't try to land all four if
the first one surfaces an unexpected issue.
For Category C (single-spec audit):
- Pick 1-2 deferred items per the table above. Standard fan-out
per `runners/<closest-template>.spec.ts`.
- Walk recent open issues + the deferred-items list. Pick 1-2
that are now tractable. Standard fan-out per
`runners/<closest-template>.spec.ts`.
#### Per-subagent prompt shape
```
You're implementing ONE [test-harness runner | primitive] for
<TARGET>.
You're implementing ONE [test-harness runner | primitive |
investigation] for <TARGET>.
Read in order:
- docs/testing/cases/<FILE>.md (focus on <TARGET>'s Code anchors)
@@ -373,8 +333,9 @@ Write tools/test-harness/src/runners/<TARGET>_short_name.spec.ts
[ AND/OR tools/test-harness/src/lib/<NEW-PRIMITIVE>.ts ].
[per-task specifics: pattern (seedFromHost / mock-then-call /
asar fingerprint / shared isolation / new-primitive-build),
assertion shape, skip rules, key constraint warnings]
asar fingerprint / shared isolation / new-primitive-build /
investigation), assertion shape, skip rules, key constraint
warnings]
Constraints:
- Tabs, ~80-char wrap.
@@ -390,20 +351,21 @@ If the target isn't reasonable to implement (anchors don't resolve
to anything assertable, the test depends on state you can't
construct, the existing primitives don't cover the surface), DO
NOT write a stub. Report under Open questions and stop. Sessions
1-5 had cumulative ~12 "stop and report" outcomes that were the
1-6 had cumulative ~13 "stop and report" outcomes that were the
right call (S20 deferral, T05 reshape, T07 needs seedFromHost,
T08 needs setState('close'), S28 reclassification, T38 framing,
session-3 eipc-registry finding, T37 fixture-readback deferral,
S14 primitive-gap, T35/T36 Phase 2 deferrals, T18 Tier 1 reframe,
T36 Phase 2 reclassification to Tier 3/4).
S14 primitive-gap then primitive-build, T35/T36 Phase 2 deferrals,
T18 Tier 1 reframe, T36 Phase 2 reclassification to Tier 3/4,
session-6 lib/input-niri.ts shipped untested-on-niri).
Report shape (~150 words):
## <TARGET> [runner | primitive]
## <TARGET> [runner | primitive | investigation]
- File written: tools/test-harness/src/runners/<filename>.spec.ts
[or lib/<newfile>.ts]
- Layer: file probe | argv probe | L1 | L2 (xprop) | L2 (DBus) |
pgrep | new-primitive
pgrep | new-primitive | investigation
- Assertion shape: <one sentence>
- Skip rules: <which rows + why>
- Verification path: <typecheck + run result>
@@ -417,10 +379,7 @@ After fan-out returns:
1. `cd tools/test-harness && npm run typecheck` — must stay clean.
2. Run the new runners against KDE-W (the dev box) — but flag the
user first if any are destructive (seedFromHost kills running
Claude). For Category A, S14 will skip on KDE-W (row gate is
Niri-only); the typecheck pass is the verification on KDE-W,
and a real Niri-row run is for the next sweep. Capture
pass/skip/fail per spec for the matrix.
Claude). Capture pass/skip/fail per spec for the matrix.
3. Update [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
"Status (post-execution)" section to reflect newly-shipped
specs and any reclassifications discovered mid-flight.
@@ -431,7 +390,7 @@ After fan-out returns:
- Primitives landed (with API shape)
- Specs deferred (with the per-test rationale)
- Specs reclassified (Tier 3 → Tier 2, Tier 2 → Tier 1, etc.)
- Updated coverage stat (was 61/76 = 80%, now N/76 = M%)
- Updated coverage stat (was 62/76 = 82%, now N/76 = M%)
6. Don't commit. The user reviews and commits.
7. Rotate this prompt: rewrite
`docs/testing/runner-implementation-followup-prompt.md` for
@@ -439,7 +398,7 @@ After fan-out returns:
### Self-correction loop
Same as sessions 1-5:
Same as sessions 1-6:
1. Subagent typecheck failure → re-spawn with explicit fix
instruction.
@@ -455,16 +414,18 @@ Same as sessions 1-5:
finding came from finding only 3 handlers in the registry —
the lesson is to verify the assertion is meaningful, not just
that it passes.
5. **Carry-over from session 5:** If pursuing Category B and the
inspector walk turns up empty after 2-3 approaches, STOP.
Don't keep digging — document what was tried, ship the H06
"documentation runner" if it surfaces useful state, move to
Category A or C.
6. **NEW for session 6:** If pursuing Category A and the niri
IPC `--json` output has shifted from the session 5 recon
(e.g. the Window struct shape changed; an action got renamed),
STOP and re-fetch the wiki / probe a live niri instance if
available. Don't ship against a stale schema.
5. **Carry-over from session 5/6:** If pursuing Category A and the
inspector / hook / patch approaches turn up empty after 2-3
approaches, STOP. Don't keep digging — document what was
tried, ship the H06 documentation runner if it surfaces
useful state, move to Category C.
6. **NEW for session 7:** If Category A's hook approach lands a
handle but T22 / T31 / T33 / T38 upgrades reveal the channels
route through different code paths than the bundle strings
suggest (i.e. the runtime registry's contents don't match the
case-doc Code anchors), re-examine the case-doc anchors before
shipping the upgrade — the assertion shape might need
adjustment, not the test target.
Cap re-spawns at 2 per file. Past that, mark as needing human
review and move on.
@@ -483,39 +444,41 @@ Stop and write the final report when one of:
4. **Session budget hits ~3 new specs OR one new primitive
landing.** Stop, synthesize, leave the rest for the next
session.
5. **Category B inspector walk turns up empty after 2-3 distinct
approaches.** Document the dead-end as a finding, ship H06
if useful, pivot to Category A or C if budget remains.
5. **Category A approaches all turn up empty after 2-3 distinct
attempts.** Document the dead-end as a finding, ship H06 if
useful, pivot to Category C if budget remains.
### What you should NOT do
- **Don't try to land Category A + B + C in one batch.** Pick ONE
as the main bet.
- **Don't try to land Category A + Category C in one batch.** Pick
ONE as the main bet.
- **Don't ship stubs.** If a runner can't actually assert what the
spec says, mark it as Tier 3 / blocked / primitive-gap and
don't write a placeholder. The cumulative twelve "stop and
report" outcomes from sessions 1-5 were the right call — every
don't write a placeholder. The cumulative thirteen "stop and
report" outcomes from sessions 1-6 were the right call — every
one revealed a real constraint.
- **Don't break existing runners.** H01-H05 are the canaries.
- **Don't restructure `lib/`** beyond targeted additions.
Premature abstractions are wrong abstractions.
`electron-mocks.ts` (session 3) and `input.ts` (session 4) were
threshold-driven extractions, not speculative. `input-niri.ts`
for Category A is the same shape — a single-consumer extraction
with the API mirroring its X11 sibling.
`electron-mocks.ts` (session 3), `input.ts` (session 4), and
`input-niri.ts` (session 6) were threshold-driven extractions,
not speculative.
- **Don't run destructive Tier 3 tests** that write to the user's
real claude.ai account (T22 PR write, T27 scheduling, T29
worktree creation, T34 OAuth, T36 hooks-fire-on-prompt-submit).
Only the *read-only reframes* of those are in scope.
- **Don't introspect `ipcMain._invokeHandlers` for `claude.web`
eipc channels.** Confirmed broken in session 3. Category B is
eipc channels.** Confirmed broken in session 3. Category A is
the ONLY appropriate path to runtime IPC verification for those
channels.
- **Don't bolt Wayland into `lib/input.ts`.** Sibling file or new
primitive only; the X11-strict gate is load-bearing. Session 6
Category A puts niri in `lib/input-niri.ts`.
- **Don't bolt other compositors into `lib/input-niri.ts`.**
Sway / Hyprland / River each get their own per-compositor file
if a consumer surfaces. With S14 the only consumer, no
expansion is justified yet.
- **Don't bolt Wayland into `lib/input.ts`.** X11-strict gate is
load-bearing.
- **Don't speculate on a `lib/input-wayland.ts` dispatcher.**
Per-compositor files until a second consumer lands.
Per-compositor files until a second Wayland consumer lands.
- **Don't preemptively build `CodeTab.activateTopTab()` /
`startNewSession()`.** Session 5 captured the AX anchors but
T36 Phase 2 (the only known consumer) was reclassified out.
@@ -527,13 +490,13 @@ Stop and write the final report when one of:
### Final report format
```markdown
## Runner implementation summary (session 6)
## Runner implementation summary (session 7)
- Main-bet category: A | B | C
- Specs landed: N
- Primitives landed: N
- Reclassified mid-flight: N (with reasons)
- Coverage: was 61/76 (80%), now <NEW>/76 (<PCT>%)
- Coverage: was 62/76 (82%), now <NEW>/76 (<PCT>%)
- Typecheck: clean | <errors>
- KDE-W test run: <pass/skip/fail counts>
@@ -541,7 +504,7 @@ Stop and write the final report when one of:
| Cat | Test ID | File | Assertion shape | Status |
|---|---|---|---|---|
| A | S14 | S14_quick_entry_from_other_focus_niri.spec.ts | … | ✓ pass / skip (Niri-only) |
| A | T22 | T22_pr_monitoring_handler.spec.ts | … | ✓ pass / skip / fail |
| ... |
## Notable findings
@@ -585,14 +548,13 @@ git diff --stat
`Promise<boolean>` variant + T25's for the void variant.
- For focus-shifting (X11 only): `lib/input.ts` exports
`focusOtherWindow` + `spawnMarkerWindow`. See S11 for the
end-to-end consumer pattern (single-shot diagnostic record,
marker-window cleanup in `finally`, defensive
`WaylandFocusUnavailable` / `XdotoolUnavailable` skip catches).
- **For Wayland-native focus-shifting (Niri only, if Category A
ships):** the recon's API sketch is in plan-doc session 5.
Mirror `lib/input.ts`'s shape. Use `niri msg --json` (the
contracted-stable surface; plain text is unstable per the
wiki). `foot --title <T> -e sleep 600` is the marker process.
end-to-end consumer pattern.
- For Wayland-native focus-shifting (Niri only): `lib/input-niri.ts`
exports the same shape with `niri msg --json` IPC + `foot`
marker. See S14 for the end-to-end consumer pattern. The
primitive is untested-on-real-Niri as of session 6 — the
first real Niri sweep run will confirm the schema assumptions
documented in its leading comment.
- **For asar fingerprints: ALWAYS grep the installed asar
first.** Build-reference is beautified; the bundle is
minified. Case-doc text may be the user-facing form, not the

View File

@@ -18,6 +18,82 @@ work begins.
## Status (post-execution)
**Shipped session 6 (1 new spec + 1 new primitive):** S14 (Tier 2 — Niri-
only, currently known-failing detector). New primitive `lib/input-niri.ts`
(Wayland-native focus-shifter sibling of `lib/input.ts`:
`focusOtherWindow` / `spawnMarkerWindow` / `getFocusedWindowId` /
`isNiriSession` plus `NiriIpcUnavailable` / `FootUnavailable` typed
errors). Coverage moved from 61/76 (80%) to 62/76 (82%).
Session 6 findings + reclassifications:
- **S14 shipped as Tier 2 known-failing detector.** Near-clone of S11's
shape with imports swapped from `lib/input.js` to `lib/input-niri.js`
and the row gate flipped from `['GNOME-X', 'Ubu-X']` to `['Niri']`.
Same five-phase shape: setup → ready → marker spawn → focus loop with
sticky-error short-circuits → press shortcut + assert popup visible.
Diagnostic record fields parallel S11's `s11-diagnostics`
(`activeWidBeforeFocus` / `activeWidAfterFocus` typed `number | null`
for niri u64 IDs vs the X11 hex strings). Currently a known-failing
detector per case-doc S14 (`Failed to call BindShortcuts (error code
5)`); same shape as S12's GNOME-W `--enable-features=GlobalShortcutsPortal`
detector — the spec encodes the contract and will start passing on
Niri rows once the upstream / Chromium-side portal issue is resolved
without any spec edit.
- **`lib/input-niri.ts` extracted as the niri-side focus-shifter
substrate.** Niri-only by design — strict
`XDG_CURRENT_DESKTOP === 'niri'` gate via `isNiriSession()`. Exports:
`focusOtherWindow(title)` (`niri msg --json windows`
`app_id !== 'Claude'` filter + title match → `niri msg action
focus-window --id <u64>` → honest readback via `getFocusedWindowId()`
using `retryUntil`), `spawnMarkerWindow(title)` (backgrounded
`foot --title <T> -e sleep 600` with kill-with-grace, mirroring the
X11 primitive's xterm pattern), `getFocusedWindowId()` (parses
`niri msg --json focused-window` to `number | null`), `isNiriSession()`,
`MarkerWindow` interface, `NiriIpcUnavailable` / `FootUnavailable`
typed errors. The primitive verifies the focus shift took (niri's
`focus-window` action exits 0 even when the compositor refuses
activation — only `focused-window` readback is the honest answer).
Defensive `unwrapOk` helper handles both the older
`{Ok: {FocusedWindow: ...}}` Result-style JSON envelope and newer
bare-payload responses; if niri ships a third shape, the parser
falls through to `null` rather than crashing.
- **Cross-compositor dispatcher NOT speculated.** Sway / Hyprland /
River each have totally different IPCs (`swaymsg`, `hyprctl`,
`riverctl`); the long-term cross-compositor answer is libei but
isn't widely deployed. Per-compositor files until a second consumer
surfaces — a hypothetical `lib/input-wayland.ts` dispatcher would
just switch on `XDG_CURRENT_DESKTOP` and delegate. With only S14
consuming `lib/input-niri.ts`, a dispatcher would be ceremony.
- **Category B (eipc-registry exposer) NOT attempted.** Same reasoning
as sessions 4/5: session 3 already established the registry is
closure-local, the inspector walk came up empty, and the early-exit
cap on retries makes Category B a poor main bet without a new
approach. Stays available for a future session that takes the
closure-local reverse-engineering as its main work.
Tier 2 → Tier 2 candidates remaining for next session: **T35 Phase 2**
and **T37 Phase 2** (still need closure-local readback or the
eipc-registry exposer; unchanged from sessions 4/5). **eipc-registry
exposer** (closure-local in main; reverse-engineering remains
unattempted — now the cleanest single-session win available, with all
the obvious focus-shifter / mock-then-call work already landed). The
primitive surface itself isn't growing quickly — `lib/electron-mocks.ts`
(session 3), `lib/input.ts` (session 4), and `lib/input-niri.ts`
(session 6) are all threshold-driven extractions, not speculative.
Session 6 untested-on-Niri caveats: the `lib/input-niri.ts` primitive
landed against session 5's recon notes, not a live niri session. First
real Niri sweep run will confirm: (a) the `Ok`-wrapper unwrap covers
the niri version on the row; (b) Claude's `app_id` value on niri is
literal `'Claude'` (the primitive's `app_id !== 'Claude'` guard
becomes a no-op rather than wrong if the actual value differs — match
still happens by title; tighten if needed); (c) `foot` is on the
target row's PATH (skip path is clean if not). Verified on KDE-W: the
runner skips correctly via the row gate.
---
**Shipped session 5 (1 new spec):** T18 (Tier 1 fingerprint). No new
primitives. Coverage moved from 60/76 (79%) to 61/76 (80%).

View File

@@ -7,7 +7,7 @@ architecture, decisions, and rationale.
## Status
Sixty-one specs wired (24 cross-env T-tests, 32 env-specific S-tests,
Sixty-two specs wired (24 cross-env T-tests, 33 env-specific S-tests,
5 H-prefix harness self-tests). See
[`docs/testing/runner-implementation-plan.md`](../../docs/testing/runner-implementation-plan.md)
for the tiered triage of remaining tests and the per-spec rationale
@@ -62,6 +62,7 @@ behind tier classification.
| [S10](../../docs/testing/cases/shortcuts-and-input.md#s10--quick-entry-popup-is-transparent-no-opaque-square-frame) | KDE-W only — popup runtime `getBackgroundColor() === '#00000000'` after Quick Entry opens (regression-detector against electron#50213 if bundled Electron in 41.0.4-bisect-window) | L1 + ydotool |
| [S11](../../docs/testing/cases/shortcuts-and-input.md#s11--quick-entry-shortcut-fires-from-any-focus-on-wayland-mutter-xwayland-key-grab) | GNOME-X / Ubu-X only (X11-side regression detector) — spawn xterm marker, `xdotool windowfocus` to it, verify `_NET_ACTIVE_WINDOW` shifted, fire `Ctrl+Alt+Space` via ydotool, assert popup visible. Wayland-side mutter regression (#404) is a primitive gap — needs Wayland-native focus injection (libei) | L1 + xdotool focus + ydotool shortcut |
| S12 | `--enable-features=GlobalShortcutsPortal` in Electron argv (GNOME-W only — currently a known-failing regression detector) | argv probe |
| [S14](../../docs/testing/cases/shortcuts-and-input.md#s14--global-shortcuts-via-xdg-portal-work-on-niri) | Niri only — spawn `foot` marker, `niri msg action focus-window` to it, verify `niri msg --json focused-window` shifted, fire `Ctrl+Alt+Space` via ydotool, assert popup visible. Currently known-failing detector for the Niri portal `BindShortcuts` path (parallels S12's GNOME-W detector) | L1 + niri msg focus + ydotool shortcut |
| [S15](../../docs/testing/cases/distribution.md#s15--appimage-extraction---appimage-extract-works-as-documented-fallback) | `--appimage-extract` exits 0; `squashfs-root/AppRun --version` runs without FUSE error | spawn + filesystem |
| [S16](../../docs/testing/cases/distribution.md#s16--appimage-mount-cleans-up-on-app-exit) | `mount(8)` shows new `.mount_claude` while app is up; gone within 10s of close | mount delta |
| [S17](../../docs/testing/cases/platform-integration.md#s17--app-launched-from-desktop-inherits-shell-path) | Shell-path-worker overlays user's login-shell PATH onto a deliberately-scrubbed env | L1 + utilityProcess |
@@ -96,8 +97,11 @@ isolation env (S19), the `lib/electron-mocks.ts` mock-then-call
helpers — `installOpenDialogMock` (T17), `installShowItemInFolderMock`
(T25), `installOpenExternalMock` (T24) — the `lib/input.ts`
focus-shifter (`focusOtherWindow` + `spawnMarkerWindow` for S11; X11
only — `WaylandFocusUnavailable` thrown on native Wayland) and the
`createIsolation({ seedFromHost: true })` primitive that lets
only — `WaylandFocusUnavailable` thrown on native Wayland) and its
Niri-native sibling `lib/input-niri.ts` (`niri msg --json` for the
focus-injection + readback chain, `foot --title` for the marker
window; `NiriIpcUnavailable` thrown off-Niri; consumed by S14) — and
the `createIsolation({ seedFromHost: true })` primitive that lets
login-required tests run hermetically against a copy of the host's
signed-in auth state (T07, T16, T26).
@@ -250,6 +254,7 @@ tools/test-harness/
│ │ ├── claudeai.ts # claude.ai renderer UI domain (CodeTab, dialog mock, atoms)
│ │ ├── electron-mocks.ts # mock-then-call helpers (dialog/showItemInFolder/openExternal)
│ │ ├── input.ts # focus-shifter primitive (X11 only — xdotool + xprop verify; spawnMarkerWindow xterm)
│ │ ├── input-niri.ts # focus-shifter primitive (Niri only — niri msg --json verify; spawnMarkerWindow foot)
│ │ ├── retry.ts # poll-until-true with timeout
│ │ └── diagnostics.ts # launcher log, --doctor, session env
│ └── runners/ # one .spec.ts per test ID