docs(testing): session 15 plan/inventory + rotate session 16 prompt

Plan-doc Status (post-execution): adds session 15 entry capturing the T17 structural fix (legacy `CLAUDE_TEST_USE_HOST_CONFIG=1` → `seedFromHost: true`), the RawElement import prune, the debugger-attached-to-leaked-test-isolation discovery, the `openPill` / `clickMenuItem` migration park decision, and the "productivity signal is dimming — 3 consecutive sessions without coverage gain" note for the orchestrator. Followup prompt rotation: rewrites for session 16 with the new PRIORITY (run T17 to verify the seedFromHost migration), the upgraded Phase 0 calibration check (port-9229 attachment quality, not just port status — must distinguish auth-bearing Claude from leaked /login isolations via `evalInMain` webContents probe), the narrowed category list (D-verify + C + STOP recommendation), and the explicit STOP termination criterion if both D-verify and C turn up empty. Co-Authored-By: Claude <claude@anthropic.com>
2026-05-17 00:26:21 +03:00 · 2026-05-04 00:23:16 -04:00
parent af8a60bdb1
commit 14ccb61596
2 changed files with 446 additions and 373 deletions
--- a/docs/testing/runner-implementation-followup-prompt.md
+++ b/docs/testing/runner-implementation-followup-prompt.md
@@ -1,118 +1,139 @@
-# test-harness runner implementation — session 15 prompt
+# test-harness runner implementation — session 16 prompt

 This file is meant to be **copied verbatim into a fresh Claude Code
 session** as the initial user message. Don't paraphrase it; the
 orchestration depends on the exact directives below.

 You're picking up after a runner-implementation session that landed 1
-call-site migration (no new spec, no new primitive). Session 14 was
-a flake-reduction session: Phase 0 calibration found the debugger
-detached on the dev box (port 9229 not listening — Claude was not
-running, or running but Developer → Enable Main Process Debugger had
-not been clicked), which blocked Categories A (operon-mode
-navigation probe), B (Tier 3 read-only reframes), and C (schema-rev
-for `listRemotePluginsPage` / `listSkillFiles`) — all needing runtime
-probing against debugger-attached Claude. Session 14 pivoted to the
-PRIORITY Category D (call-site migration to `waitForAxNode`), which
-was tractable without the debugger because the migration is pure
-shape-only refactor against existing `lib/ax.ts` substrate. Coverage
-unchanged at 74/76 (97%) — migration sessions don't move the spec
-count, but T16's pre-existing failure mode (`no AX-tree button with
-accessibleName="Code" found`) is fixed by the migration. Two commits
-on `docs/compat-matrix` expected (autonomous orchestration commits +
-pushes — the user reviews after the session):
+structural fix (T17 migrated from legacy `CLAUDE_TEST_USE_HOST_CONFIG=1`
+auth path to `seedFromHost: true`, no new spec, no AX migration).
+Session 15 was an investigation session: Phase 0 calibration found
+port 9229 listening BUT the attached process was a leaked test
+isolation at `claude.ai/login` rather than the user's auth-bearing
+Claude — every webContents URL on that process was either `find_in_page`,
+`/login`, or `main_window/index.html`, and the user-data-dir was
+`/tmp/claude-test-*`. That made Categories A (operon-mode probe) / B
+(Tier 3 read-only reframes) / C (schema-rev) all soft-blocked: the
+debugger was technically attached, but to the wrong process for any
+auth-required investigation. Session 15 pivoted to investigating T17's
+pre-existing flake (the PRIORITY directive) and discovered the failure
+was structural rather than AX-polling-related — the spec was using the
+legacy `CLAUDE_TEST_USE_HOST_CONFIG=1` / `isolation: null` shape, and
+when run without that env var fell through to a fresh isolation with no
+auth, where `waitForUserLoaded`'s 90s default budget gets preempted by
+Playwright's 60s spec timeout. Coverage unchanged at 74/76 (97%) —
+structural fixes don't move the spec count, but T17 should now succeed
+when host is signed in (rather than auto-failing with a bare 60s
+timeout). Two commits on `docs/compat-matrix` expected (autonomous
+orchestration commits + pushes — the user reviews after the session):

- TBD — `test(harness): session 14 migrate activateTab to
-  waitForAxNode (no spec, coverage unchanged at 97%)`
-  (migrates `activateTab` from one-shot snapshot to `waitForAxNode`
-  with a configurable pre-click timeout; migrates
-  `CodeTab.activate`'s post-click `retryUntil`-around-
-  `findCompactPills` loop to `waitForAxNodes`; T16 passes 3/3 on
-  KDE-W against the migrated form, was pre-existing-flaky on the
-  baseline; T26 passes; T17 still pre-existing-flaky — verified by
-  stash + retry).
+- TBD — `test(harness): session 15 migrate T17 to seedFromHost +
+  prune unused RawElement import (no spec, coverage unchanged at 97%)`
+  (T17 spec rewrite swapping the `CLAUDE_TEST_USE_HOST_CONFIG=1` +
+  `isolation: null` branch for the canonical `seedFromHost: true`
+  pattern; prunes unused `RawElement` re-export import in
+  `lib/claudeai.ts` per session 14's leftover hint; typecheck clean;
+  T17 not actually run this session — see below).

 The plan doc at
 [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
 captures the tier classification and execution-time reclassifications.
 Its "Status (post-execution)" section is the source of truth for
-what's done and what's deferred — read **session 14** first, then
-**session 13**, then **session 12**, then **session 11**, then
-**session 10**, then **session 9**, then **session 8**, then **session
-7**, then **session 6**, then **session 5**, then **session 4**, then
-**session 3**, then **session 2**, then **session 1** sub-sections.
+what's done and what's deferred — read **session 15** first, then
+**session 14**, then **session 13**, then **session 12**, then
+**session 11**, then **session 10**, then **session 9**, then **session
+8**, then **session 7**, then **session 6**, then **session 5**, then
+**session 4**, then **session 3**, then **session 2**, then **session
+1** sub-sections.

 This session is a continuation, not a restart. Start by reading the
 plan doc's status sections.

-### Big new findings from session 14
+### Big new findings from session 15

-1. **`activateTab` no-retry was the T16 failure mode.** Verified by
-   stashing the migration and re-running T16 against the baseline —
-   same `CodeTab.activate: no AX-tree button with accessibleName="Code"
-   found` failure. The migration converts the pre-click snapshot from
-   one-shot to a `waitForAxNode` poll, with the existing T16 budget
-   (15s through `CodeTab.activate({ timeout })`) covering both the
-   pre-click click-budget and the post-click pill poll. T16 passed
-   3/3 in succession against the migrated form. Strong signal that
-   "convert one-shot AX snapshots to `waitForAxNode` polling" is a
-   high-leverage flake-reduction shape — this is the first migration
-   that demonstrably fixed an existing failure.
-2. **T17 stays pre-existing-flaky.** T17 exercises the env-pill →
-   Local → Select-folder → Open-folder chain via `openEnvPill` /
-   `selectLocal` / `openFolderPicker`, which use `openPill` and
-   `clickMenuItem` internally. Those weren't migrated this session
-   (their post-click stability gates plus per-spec sleep budgets
-   carry tuning the prompt explicitly cautioned against changing).
-   T17's flake mode is unchanged-by-migration; future sessions can
-   take it if budget tuning data warrants. The `openPill` while-loop
-   on a successful menu render takes 100ms-per-poll-iteration; if the
-   menu hasn't rendered within 5s, it returns `{ opened: false,
-   items: [] }`. Migrating to `waitForAxNode` would flatten the loop
-   shape but doesn't obviously change the outcome, so the migration
-   wasn't worth the budget-tuning risk this session.
-3. **The debugger-attachment precondition is still binding.**
-   Sessions 9-12 did extensive runtime probing of the per-wc IPC
-   registry against the user's debugger-attached Claude. Without
-   that probing, Categories A / B / C in this prompt are blocked at
-   the smoke-test phase. If the user hasn't clicked Developer →
-   Enable Main Process Debugger before the session starts, port 9229
-   is closed and the categories pivot to either documentation work
-   or further call-site migration. Phase 0 must check `ss -tln |
-   grep ':9229'` (or `curl --max-time 2 http://127.0.0.1:9229/json`)
-   before fanning out.
-4. **The reframe pool remains essentially exhausted.** Same status
-   as sessions 12-13 — every Tier 1 fingerprint with a tractable
-   runtime sibling has been promoted. The remaining options are now:
-   (a) further call-site migration to `waitForAxNode` for flake
-   reduction (`openPill` / `clickMenuItem` / T26's pre-click
-   `retryUntil` — though T26's needs a `context-was-destroyed`
-   exception swallow), (b) operon-mode navigation probe (still needs
-   debugger), (c) schema-rev for `listRemotePluginsPage` /
-   `listSkillFiles` (still needs debugger), (d) Tier 3 read-only
-   reframes (most need user-account state). Session 14 demonstrated
-   migration can deliver a measurable bug-fix outcome; that
-   continues to be the highest-leverage shape when the debugger is
-   closed.
+1. **T17 flake was structural, not AX-polling.** The trace showed
+   bare 60s Playwright timeout with NO `renderer-url` attachment —
+   meaning the test never reached line 49's attach call, which
+   means it never resolved `waitForReady('userLoaded')` at line 40.
+   Root cause: T17 was the last spec on the legacy
+   `CLAUDE_TEST_USE_HOST_CONFIG=1` / `isolation: null` shape — every
+   other auth-required spec (T07, T16, T19, T20, T21, T22b, T26,
+   T27, T31b, T33b/c, T35b, T37b, T38b) had moved to `seedFromHost:
+   true`. Without that env var (which CI / orchestration didn't
+   set), T17 fell through to a fresh isolation with no auth, hit
+   `/login`, and `waitForUserLoaded`'s 90s budget got preempted by
+   the 60s spec timeout. **Session 14's hypothesis was wrong** —
+   the AX click chain in `openPill` / `clickMenuItem` was never
+   reached, so migrating those wouldn't have fixed anything.
+2. **`openPill` / `clickMenuItem` migration parked.** With T17's
+   actual flake explained by the auth-path mismatch, there's no
+   remaining flake-evidence pulling for the AX migration that
+   sessions 14-15 considered. `openPill`'s while-loop and
+   `clickMenuItem`'s while-loop work fine when the auth path is
+   correct. Don't migrate speculatively — wait for a third
+   consumer to surface with budget-tuning evidence.
+3. **Phase 0 must distinguish "port open" from "port attached to
+   user's signed-in Claude".** Session 14 saw port 9229 closed and
+   correctly classified as debugger-detached. Session 15 saw port
+   9229 OPEN but attached to a leaked test isolation at /login —
+   Categories A/B/C still soft-blocked. The right Phase 0 probe:
+   `evalInMain` listing webContents and checking that AT LEAST one
+   URL is `https://claude.ai/<not /login>`. If every webContents is
+   `/login` or `find_in_page` or `main_window`, treat it the same
+   as port-closed for auth-required investigations. Session 15's
+   one-off probe shape (kept inline in the report, deleted after):
+
+   ```ts
+   const wcs = await client.evalInMain(`
+     const { webContents } = process.mainModule.require('electron');
+     return webContents.getAllWebContents().map((w) => ({
+       id: w.id, url: w.getURL(), title: w.getTitle(),
+     }));
+   `);
+   ```
+
+4. **Leaked `/tmp/claude-test-*` dirs accumulating on dev box.**
+   Multiple test isolations from prior sessions have leaked their
+   tmpdirs and (in some cases) their Electron child processes.
+   `ls /tmp/ | grep claude-test` showed several. The session 15
+   T17 spec wasn't run because killing those leaked Electron
+   processes might also kill the user's real running Claude (PID
+   ambiguity from `ps`). A future session can either (a) verify
+   no real Claude is running before invoking T17, or (b) just
+   accept the seedFromHost kill side effect and let the user
+   re-launch Claude after the session.
+5. **Productivity signal is dimming.** Sessions 13-15 collectively
+   produced one new primitive (`lib/ax.ts`), one substantive AX
+   migration (`activateTab` + `CodeTab.activate`), and one
+   structural fix (T17 seedFromHost). NO coverage gain in those
+   three sessions. The remaining categories without an
+   auth-bearing debugger-attached Claude are mostly exhausted.
+   Next session should prioritise (a) running T17 to verify the
+   seedFromHost fix actually resolves the timeout, and (b) checking
+   whether a Category C schema-rev probe against the leaked /login
+   isolation is tractable (validators don't need auth, only
+   invocation does — worth a 15-min investigation). If both turn
+   up empty, the orchestrator should seriously consider stopping —
+   at 97% coverage with no clear high-leverage shapes left,
+   further sessions are likely to produce documentation-only or
+   marginal-improvement deliverables.

 ### Authoritative reference

 Read these in order before fanning out:

 - [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
-  — tier classification + status section. Read **session 14**, then
-  **session 13**, **session 12**, **session 11**, **session 10**,
-  **session 9**, **session 8**, **session 7**, **session 6**,
-  **session 5**, **session 4**, **session 3**, **session 2**, then
-  **session 1** "Status (post-execution)" sub-sections. The Tier-3
-  list (search for "## Tier 3") is the candidate pool for any further
-  reframes.
+  — tier classification + status section. Read **session 15**, then
+  **session 14**, **session 13**, **session 12**, **session 11**,
+  **session 10**, **session 9**, **session 8**, **session 7**,
+  **session 6**, **session 5**, **session 4**, **session 3**,
+  **session 2**, then **session 1** "Status (post-execution)"
+  sub-sections. The Tier-3 list (search for "## Tier 3") is the
+  candidate pool for any further reframes.
 - [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
  — runner conventions, the now-74-spec inventory, primitives in
-  `lib/`, isolation defaults, the CDP-gate workaround, the eipc
-  note, and `lib/ax.ts` substrate (session 13 addition; session 14
-  migrated `activateTab` + `CodeTab.activate`'s post-click pill
-  poll to use it).
+  `lib/`, isolation defaults (T17 now seedFromHost per session 15),
+  the CDP-gate workaround, the eipc note, and `lib/ax.ts` substrate.
 - [`docs/testing/cases/README.md`](cases/README.md) — case-doc
  structure and the four anchor scopes.
 - [`tools/test-harness/src/lib/`](../../tools/test-harness/src/lib/)
@@ -123,297 +144,236 @@ Read these in order before fanning out:
  `waitForEipcChannels` / `invokeEipcChannel` on `lib/eipc.ts`) is
  unchanged.
 - [`tools/test-harness/eipc-registry-probe.ts`](../../tools/test-harness/eipc-registry-probe.ts)
-  — the session 7 read-only registry probe. Re-run against a
-  debugger-attached Claude (`Developer → Enable Main Process
-  Debugger` from the menu) to capture the current registry shape.
-  Sessions 11 / 12 used small one-off smoke-tests in the test-
-  harness dir that clone the InspectorClient connection pattern
-  and run N candidate read-sides through M arg shapes; deleted
-  after.
+  — the session 7 read-only registry probe. Re-run against an
+  auth-bearing debugger-attached Claude (`Developer → Enable Main
+  Process Debugger` from the menu, signed-in) to capture the
+  current registry shape.
 - [`tools/test-harness/src/runners/`](../../tools/test-harness/src/runners/)
-  — every existing spec is a template. Notable session 14
+  — every existing spec is a template. Notable session 15
  candidates for follow-up:
-  - `T17_folder_picker.spec.ts` — the next test that would benefit
-    from `openPill` / `clickMenuItem` migration. Pre-existing
-    flake; current failure is a 60s timeout in the
-    openEnvPill/selectLocal/openFolderPicker chain.
-  - `T26_routines_page_renders.spec.ts` — has a pre-click
-    `retryUntil` block with `context-was-destroyed` exception
-    handling that could become a `waitForAxNode` call once the
-    primitive grows error-class options.
+  - `T17_folder_picker.spec.ts` — newly migrated to seedFromHost.
+    Run to verify the 60s timeout is gone. If T17 now passes, the
+    structural fix shipped session 15 is verified.
+  - Schema-rev for `listRemotePluginsPage` / `listSkillFiles` —
+    rejection literals can be bundle-grepped without auth, and the
+    validator runs auth-independent if /login state lets us
+    invoke through the renderer-side wrapper. Session 12 found
+    `listRemotePluginsPage` needs `limit: number` at position 0
+    and `listSkillFiles` needs both `pluginId` and `skillName`.
 - [`docs/testing/cases/*.md`](cases/) — the spec each runner
  asserts. The **Code anchors:** field tells you exactly where
  upstream implements the feature.

 ### Tests in scope this session

-**Realistic ceiling: ~1 new spec OR one substantive flake-reduction
-deliverable OR one investigation.** Sessions 9-12 each landed 1-2
-specs; session 13 landed only a primitive (debugger blocked); session
-14 landed only a migration (debugger blocked). Coverage at 74/76
-means the test budget naturally shifts toward either (a) further flake
-reduction by extending the migration shape, (b) investigation that
-requires the debugger and was deferred from sessions 12-14, or (c)
-Tier 3 read-only reframes that the harness can construct from
-existing `seedFromHost` state.
+**Realistic ceiling: ~1 verification run OR ~1 schema-rev investigation
+OR a "stop the orchestration" recommendation.** Sessions 9-12 each
+landed 1-2 specs; session 13 landed only a primitive (debugger
+blocked); session 14 landed only a migration (debugger blocked);
+session 15 landed only a structural fix (debugger soft-blocked).
+Coverage at 74/76 means the test budget naturally shifts toward
+verification, low-stakes investigation, or the orchestration
+termination decision.

-**Phase 0 MUST check the debugger BEFORE picking a category.** Run
-`ss -tln 2>/dev/null | grep ':9229'` (or
-`curl --max-time 2 http://127.0.0.1:9229/json`). If port 9229 is not
-listening, Categories A and C are hard-blocked. Pivot to D or B.
+**Phase 0 MUST check the debugger-attachment quality, not just port
+status.** Run `ss -tln 2>/dev/null | grep ':9229'` for port. If open,
+also run an `evalInMain` probe to enumerate webContents URLs — if no
+URL is `https://claude.ai/<not /login>`, treat as soft-blocked for
+auth-required categories. Probe shape (kept inline; delete after):

-#### **PRIORITY: Investigate why T17 stays flaky and decide on a
-migration-or-fix path.** Session 14's migration fixed T16's pre-
-existing failure mode. T17 is the next-clearest pre-existing-flaky
-spec on KDE-W; it shares plumbing with T16 (`CodeTab` → AX-driven
-clicks) but goes deeper through `openEnvPill` / `selectLocal` /
-`openFolderPicker`. The session 14 migration does NOT reach into
-those (they use `openPill` + `clickMenuItem`, both of which carry
-post-click stability gates and per-iteration sleep loops). The
-investigation: (1) read T17's failure trace from the most recent
-session-14 stashed run (under `tools/test-harness/results/local/
-test-output/T17_folder_picker-T17-—-Folder-picker-opens/`), (2)
-classify the failure (env-pill probe? Local item? Select-folder
-pill? Open-folder click?), (3) decide if (a) `openPill` migration
-to `waitForAxNode` would reach it, or (b) the budget defaults need
-tuning, or (c) the failure is from something orthogonal to AX
-polling. If (a), ship the migration. If (b), document the budget
-mismatch in plan-doc. If (c), defer to a future session with a
-clearer signal. **If this is what session 15 ships, that's a
-strictly higher-impact outcome than another Tier 2 / Tier 3 reframe
-— flake reduction touches every existing AX-using spec.** Doesn't
-need the debugger.
+```ts
+import { InspectorClient } from './src/lib/inspector.js';
+const client = await InspectorClient.connect(9229);
+const wcs = await client.evalInMain<unknown>(`
+  const { webContents } = process.mainModule.require('electron');
+  return webContents.getAllWebContents().map((w) => ({
+    id: w.id, url: w.getURL(), title: w.getTitle(),
+  }));
+`);
+console.log(wcs); client.close();
+```

-Three categories — pick ONE as the main bet, treat the others as
-fallback if the main bet hits an early blocker:
+If every URL is `/login` or `find_in_page` or `main_window/index.html`,
+the debugger is attached to a leaked test isolation, not the user's
+Claude. Categories A and most of B are blocked. Category C may still
+be tractable since validators run auth-independent — try the schema-
+rev probe against the /login wrapper.
+
+#### **PRIORITY: Verify T17's session 15 seedFromHost migration
+actually resolves the 60s timeout.** Session 15 didn't run T17 because
+the dev box had ambiguous Electron processes (some leaked test
+isolations, possibly the user's real Claude — `ps` couldn't
+disambiguate cleanly). Session 16's first action:
+
+1. Check `pgrep -af "ozone-platform=x11.*app.asar"` and
+   `ps -o pid,user-data-dir` to identify whether any real-Claude
+   process is running (real Claude has a non-`/tmp/claude-test-*`
+   user-data-dir, typically nothing or `~/.config/Claude`).
+2. If only test cruft is running, run T17 (`npx playwright test
+   T17 --reporter=list`). The test will kill those leaked
+   processes via `seedFromHost`'s host-Claude-kill semantics —
+   that's actually a desirable cleanup side effect.
+3. If a real Claude IS running, **flag clearly in the report
+   before running**, then run T17. The user accepted the
+   `seedFromHost` kill side effect when authorising autonomous
+   orchestration; just be transparent about it.
+4. Capture pass/skip/fail. Update the matrix coverage doc if
+   T17 now passes.
+5. If T17 still fails, classify the new failure mode (is it now
+   AX-polling? Folder picker chain? Mock not installing?) and
+   decide whether to fix or defer.
+
+This is **strictly higher-impact than session 14/15's
+spec-implementation work** because it produces a concrete
+pass/fail data point that resolves a 2-session-old hypothesis.
+Doesn't need the debugger.
+
+Three categories — pick the verification run as the main bet, treat
+the others as fallback if the main bet hits an early blocker:

 | # | Tests | Source | Notes |
 |---|---|---|---|
-| **D** further call-site migration / T17 investigation | T17 / `claudeai.ts` `openPill` + `clickMenuItem` | `lib/ax.ts` (session 13 primitive) | The PRIORITY shape this session. Read T17's failure trace, decide if `openPill` migration would fix it, ship the migration if so. Same shape-only refactor risk as session 14: keep the per-spec retry budgets matching the existing tuning. Doesn't need the debugger. **Risk:** `openPill` and `clickMenuItem` carry post-click stability gates that `waitForAxNode` already covers via `stabilityGate: true`, so the migration shape should slot in cleanly — but each spec's overall budget needs verification. |
-| **A** operon-mode navigation probe | n/a (investigation) + maybe small Tier 2 reframe | new probe + bundle grep for operon URL routes | Session 10 confirmed `OperonBootstrap.ensure` registers eagerly but the other 21 wrapper-exposed operon interfaces remain registry-unconfirmed. Outputs: either an operon-mode URL form recovered from the bundle (search for `operon`-keyed routes in `claude.ai/...` paths) plus a registry re-probe after navigation, OR a deferral note explaining why operon scope can't be reached without an operon-mode entry. **Needs debugger-attached Claude on port 9229.** |
-| **B** Tier 3 read-only reframes | Pick from the Tier 3 list | T33c / T35b / T37b template + bundle grep | The Tier 3 list is full of login-required flows; some have read-only entry points that the harness CAN construct. Candidates: T22's `getPrChecks` read-side might accept a non-existent PR number / dry-run mode; T15's OAuth surface has read-only state queries. Most need the user-account-scoped state to fail-fast with a clean error rather than a real network roundtrip — investigate first. **Needs debugger for smoke-test verification.** |
-| **C** Schema-rev for `listRemotePluginsPage` / `listSkillFiles` | Bundle grep | session 9 schema-rev pattern | Both methods rejected every smoke-tested arg shape during session 12's investigation. `listRemotePluginsPage` needs `limit: number` at position 0 (rejection: `Argument "limit" at position 0 ...`); `listSkillFiles` needs both `pluginId` and `skillName` (rejection: `Argument "skillName" at position 1 ...`). Bundle-grep on the rejection literals → resolve the schema → ship a narrowly-scoped Tier 2 invocation if it unblocks a case-doc claim. **Needs debugger to verify the recovered schema.** |
+| **D-verify** T17 verification run (PRIORITY) | T17 | session 15 migration | Run T17 against the dev box. If pass, log it. If fail, classify the new failure mode. **Side effect: kills any running Claude (the user's, or leaked test cruft). Flag in the report.** Doesn't need the debugger. |
+| **C** Schema-rev for `listRemotePluginsPage` / `listSkillFiles` | Bundle grep | session 9 schema-rev pattern | Both methods rejected every smoke-tested arg shape during session 12's investigation. `listRemotePluginsPage` needs `limit: number` at position 0 (rejection: `Argument "limit" at position 0 ...`); `listSkillFiles` needs both `pluginId` and `skillName` (rejection: `Argument "skillName" at position 1 ...`). Bundle-grep on the rejection literals → resolve the schema → ship a narrowly-scoped Tier 2 invocation if it unblocks a case-doc claim. **Tractable against a /login isolation since validators run auth-independent.** |
+| **STOP** Orchestrator stop recommendation | n/a | session 15 productivity signal | Coverage at 97%, three consecutive non-coverage sessions, remaining categories soft- or hard-blocked. If D-verify and C both produce nothing tractable, formally recommend the orchestrator stop. Documentation-only sessions are still acceptable per the followup termination criteria, but consecutive ones with no improvement signal are noise. |

-If port 9229 is closed, only D is fully tractable. A documentation-
-only session that audits the existing AX call-sites and proposes a
-migration plan (without shipping) is also acceptable — pre-work for
-a future session that DOES land the migration.
+#### Category D-verify — T17 verification run

-#### Category D — further call-site migration / T17 investigation
+The plan: run the post-session-15 T17 against the dev box and capture
+the result. Pass = the structural fix landed correctly. Fail = the
+hypothesis was incomplete; classify and decide.

-The plan: investigate T17's pre-existing flake, decide on a fix path,
-ship if a `waitForAxNode`-shaped migration of `openPill` /
-`clickMenuItem` would reach it.
-
-1. **Read T17's most recent failure trace.** Either the session-14
-   stashed-baseline trace (under `tools/test-harness/results/local/
-   test-output/T17_folder_picker-T17-—-Folder-picker-opens/`) or run
-   T17 fresh against the post-session-14 form. Classify the failure:
-   - openEnvPill timeout? (would suggest `openPill` migration)
-   - selectLocal timeout? (would suggest `clickMenuItem` migration)
-   - openFolderPicker chain timeout? (suggests deeper issue)
-   - Some other failure?
-2. **If `openPill` migration would reach the failure**, migrate it.
-   The shape: replace the post-click while-loop with
-   `waitForAxNodes` filtered to MENU_ITEM_ROLES, with the existing
-   `timeout` parameter as `timeoutMs`. Keep the upfront
-   `waitForAxTreeStable` gate or pass `stabilityGate: true` to
-   `waitForAxNodes`. Verify with T17 (or the originally-affected
-   spec).
-3. **If `clickMenuItem` migration would reach the failure**, same
-   shape. Replace the while-loop with `waitForAxNode` filtered on
-   role + textPattern, with the existing `timeout` as `timeoutMs`.
-4. **If the failure is orthogonal to AX polling** (e.g. environmental,
-   timing race outside the AX surface, dialog mock not installing),
-   document and defer.
+1. **Disambiguate running Claude processes.** `pgrep -af
+   "ozone-platform=x11.*app.asar"`; for each, `cat
+   /proc/<pid>/cmdline | tr '\0' '\n' | grep user-data-dir` (or
+   inspect via `ps` cmdline). If only `/tmp/claude-test-*`
+   user-data-dirs, no real Claude is running.
+2. **Run T17.** `cd tools/test-harness && npx playwright test
+   T17_folder_picker --reporter=list 2>&1 | tee
+   /tmp/t17-session16.log`.
+3. **Classify.**
+   - Pass: structural fix verified. Update plan-doc / matrix.
+   - Skip with "seedFromHost unavailable": means host has no
+     `~/.config/Claude/Local State`. Should be rare on the dev
+     box but possible if config was wiped between sessions.
+   - Skip with "seeded auth did not reach post-login URL":
+     auth was seeded but stale. User needs to re-sign-in
+     manually. Don't try to reseed automatically.
+   - Fail with NEW failure mode: classify the failure (AX
+     click? openFolderPicker chain? dialog mock?). If it's
+     now in `openPill` / `clickMenuItem`, sessions 14/15's
+     speculation has finally hit; ship the AX migration.
+     Otherwise document and defer.
+4. **Don't restructure T17's body** unless step 3 surfaces a
+   real new bug. Keep changes scoped to whatever the verification
+   surfaces.

 Doesn't need the debugger.

-#### Category A — operon-mode navigation probe
-
-The plan: find an operon-mode URL form and verify whether the other
-21 operon interfaces register lazily.
-
-1. **Bundle grep for operon URL routes.** Search the bundled
-   `index.js` and `mainView.js` for `operon`-keyed paths (e.g.
-   `/operon/...`, `claude.ai/operon`, etc.). Compile a candidate URL
-   list.
-2. **Navigate the user's debugger-attached running Claude** to each
-   candidate URL via `inspector.evalInRenderer('claude.ai',
-   "window.location.href = '<URL>'")`. After each navigation, re-run
-   the registry probe and check the operon scope's interface count.
-3. **If any URL surfaces additional operon handlers**, ship a small
-   Tier 2 reframe spec.
-4. **If none of the candidate URLs surface additional handlers**,
-   document as "operon scope handlers register lazily on a navigation
-   we can't easily construct from the harness" and defer.
-
-**Needs debugger-attached Claude on port 9229.**
-
-#### Category B — Tier 3 read-only reframes
-
-The plan: identify a Tier 3 spec where a non-destructive read-side
-is invocable from a fresh `seedFromHost` isolation.
-
-1. **Read the Tier 3 list** in plan-doc and pick 1-2 candidates with
-   read-side anchors. Most Tier 3 specs are write-side flows (T15
-   OAuth, T22 PR write, T27 scheduling write, T29 worktree creation,
-   T34 OAuth, T36 hooks-fire-on-prompt-submit) — those are out of
-   scope. The exceptions are read-side anchors that just need
-   user-account-scoped data to assert against.
-2. **Smoke-test the candidate read-side** with various arg shapes.
-3. **Ship a Tier 2 reframe** if the read-side resolves cleanly.
-4. **Defer** if every candidate requires real account state to assert
-   meaningfully.
-
-**Needs debugger for smoke-test verification.**
-
 #### Category C — Schema-rev for rejecting read-sides

 The plan: resolve the validator schema for `listRemotePluginsPage` /
 `listSkillFiles` via bundle grep, ship invocations if either unblocks
-a case-doc claim.
+a case-doc claim. Tractable against a /login isolation since
+validators run auth-independent.

 1. **Grep on the rejection literal** in the bundled `index.js`.
   Validator block sits ~50-200 chars before the throw site (session
   9 finding). Read ~2KB around the hit to surface the full schema.
 2. **Smoke-test the recovered schema** against the user's debugger-
-   attached running Claude.
+   attached running Claude (or, if auth-soft-blocked as in session 15,
+   against the /login isolation — validators run regardless of auth).
 3. **Connect the resolved invocation to a case-doc claim.**
 4. **Ship a Tier 2 invocation** if a case-doc claim is unblocked.

-**Needs debugger to verify the recovered schema.**
+Auth-independent for the validator; auth-bearing for any handler that
+actually returns plugin / skill data. If the validator resolves but
+the handler fails on auth, document the schema in plan-doc as a
+deferred reframe and move on.

-#### Cross-compositor focus-shifter expansion (NOT recommended this session)
+#### STOP recommendation

-Building `lib/input-sway.ts` / `lib/input-hypr.ts` would mirror
-`lib/input-niri.ts`'s shape but no consumer is asking for them.
-Premature abstractions are wrong abstractions. Wait for a real
-consumer.
-
-#### Main-side `invokeEipcChannel` fallback (NOT recommended this session)
-
-Same status as sessions 8-14 — wait for a real consumer.
-
-#### Launch event-subscription primitive (NOT recommended this session)
-
-Same status as sessions 11-14 — wait for a real consumer.
-
-#### `waitForRenderedSurface` registry (NOT recommended this session)
-
-Session 13's `lib/ax.ts` deliberately did NOT ship a named-surface
-registry; promote when a third consumer crystallizes with a specific
-surface name in mind.
-
-#### CSS-querySelector primitive (NOT recommended this session)
-
-Session 13's `lib/ax.ts` covers AX-tree consumers only. T07's CSS-
-querySelector poll for the topbar is a different abstraction (DOM,
-not AX). Wait for a second consumer before extracting.
+If D-verify resolves cleanly (pass or stable skip) and C produces no
+shippable spec after the schema-rev investigation, the productivity
+signal for further sessions is squarely "documentation-only with no
+clear next-step deliverable." The orchestrator should stop. State
+this plainly in the final report; don't keep cycling.

 ### Constraints to respect (don't violate)

-These are unchanged from sessions 1-14 and still load-bearing:
+These are unchanged from sessions 1-15 and still load-bearing:

 - **Default isolation** unless the spec needs otherwise. Use
  `seedFromHost: true` for any test that depends on authenticated
  renderer state — never assume default isolation gets past
  `/login`. T07/T11_runtime/T16/T17/T19/T20/T21/T26/T22b/T27/T31b/T33b/T33c/T35b/T37b/T38b
-  are the templates.
+  are the templates. **T17 was migrated to this shape in session 15.**
 - **eipc handlers register on `webContents.ipc._invokeHandlers`,
  NOT global `ipcMain._invokeHandlers`.** Session 7 finding. Use
-  `lib/eipc.ts` rather than rolling a new walker. The framing
-  prefix `$eipc_message$_<UUID>_$_` should stay opaque to consumers
-  (UUID has been stable but `lib/eipc.ts` doesn't pin it — match
-  by case-doc-anchored suffix).
+  `lib/eipc.ts` rather than rolling a new walker.
 - **eipc invocation goes through the renderer-side wrapper at
  `window['claude.<scope>'].<Iface>.<method>`.** Session 8 finding.
  Use `lib/eipc.ts`'s `invokeEipcChannel` rather than rolling
  main-side direct calls.
 - **For arg validator schema-rev: try smoke-test first, then grep
-  the rejection message literal.** Session 9 finding. Trivial
-  validators (`typeof === 'string'` / similar) resolve in one
-  round-trip. Elaborate validators get the bundle-grep treatment.
- **For session-scoped Tier 2 reframes: `LocalSessions/getAll` is
-  the foundational read-side surrogate.** Session 10 finding.
- **For Tier 2 reframes with case-doc-anchored read-side handlers:
-  invoke the case-doc-anchored handlers directly.** Session 11
-  finding. Mixed-shape dual invocation is fine.
- **For Tier 2 reframes spanning two interfaces: invoke a read-side
-  from each.** Session 12 finding (T11_runtime template).
+  the rejection message literal.** Session 9 finding.
 - **For AX-tree consumers: use `lib/ax.ts`.** Session 13 finding.
  `snapshotAx` for one-shot reads, `waitForAxNode` /
-  `waitForAxNodes` for predicate-based polling. Don't reach into
-  `explore/walker.ts` directly — re-exports go through `lib/ax.ts`.
-  Consumers in session 14: `lib/claudeai.ts`'s `activateTab` +
-  `CodeTab.activate` post-click pill poll (migrated from one-shot
-  / hand-rolled retryUntil), plus T26.
+  `waitForAxNodes` for predicate-based polling.
 - **For call-site migrations to `waitForAxNode`: keep the per-spec
  retry budgets matching the existing tuning.** Session 14
-  finding. The defaults in `lib/ax.ts` (`timeoutMs: 5000`,
-  `intervalMs: 200`) are reasonable starting values, but any
-  caller with a known per-spec budget should pass it through. The
-  one acceptable bug-fix during migration is when the existing
-  call-site had NO retry at all (e.g. `activateTab`'s pre-click
-  one-shot snapshot) — adding a budget is the fix the migration
-  delivers, and the prompt explicitly authorized it.
+  finding. Migration is shape-only EXCEPT when the call-site has
+  NO retry at all — adding a budget is the bug-fix the migration
+  delivers.
+- **For test specs that depend on host auth: use `seedFromHost:
+  true`.** Session 15 finding. The legacy `CLAUDE_TEST_USE_HOST_CONFIG=1`
+  / `isolation: null` shape collides with Playwright's 60s spec
+  timeout when the env var isn't set; `seedFromHost` gives a clean
+  skip-or-pass shape. T17 was the last spec on the legacy shape.
 - **`lib/input.ts` is X11-only.** Strict gate.
 - **`lib/input-niri.ts` is Niri-only.** Strict gate.
- **Don't speculate on `lib/input-wayland.ts` dispatcher.**
- **Code-tab AX anchors stay in plan-doc until a consumer needs
-  them.**
 - **CDP auth gate is alive** — runtime SIGUSR1 attach via
  `app.attachInspector()`, never Playwright's `_electron.launch()`
  or `chromium.connectOverCDP()`.
 - **BrowserWindow Proxy gotcha** — use
  `webContents.getAllWebContents()` not
-  `BrowserWindow.getAllWindows()`. Constructor-level wraps don't
-  work; use prototype-method hooks.
+  `BrowserWindow.getAllWindows()`.
 - **`skipUnlessRow()` always first.**
 - **No fixed sleeps.** `retryUntil` from `lib/retry.ts`, or
  Playwright auto-wait, or `waitForAxNode` from `lib/ax.ts`.
-  (Exception: short sleeps inside hand-rolled retry loops that
-  catch typed errors and short-circuit; see S11 / S14.)
 - **Diagnostics on every run.** `testInfo.attach()` the artefacts.
 - **Tag with annotations.** `severity:` and `surface:` on every
  test so JUnit carries them through to matrix-regen.
 - **Tabs in TS, ~80-char wrap as the existing files do.**
 - **Don't break existing runners.** `npm run typecheck` must stay
  clean. H01-H05 are the canaries; `npm test` must still pass them
-  after every commit. Note that T17/T07/S25/S29-S31/S04 etc.
-  are pre-existing-flaky on KDE-W per session 13's full-suite run
-  (T16 fixed by session 14) — they're NOT canaries; baseline
-  failures don't block work.
+  after every commit. Note that T07 / S25 / S29-S31 / S04 etc.
+  may be pre-existing-flaky on KDE-W — they're NOT canaries;
+  baseline failures don't block work.
 - **Always grep the installed asar** to verify a fingerprint
  string is present.
- **For mock-then-call: the helper goes in
-  `lib/electron-mocks.ts`.**
- **Marker windows / sacrificial host processes always die in
-  `finally`.**
- **Never log handler response BODIES into JUnit.**

 ### Phases

 #### Phase 0 — calibration

 1. `cd tools/test-harness && npm run typecheck` — should pass.
-2. **Check debugger:** `ss -tln 2>/dev/null | grep ':9229'` (or
-   `curl --max-time 2 http://127.0.0.1:9229/json`). If port 9229 is
-   open, A / B / C are tractable; if closed, pivot to D or
-   documentation-only.
-3. Read the plan doc's "Status (post-execution)" session 14 section,
-   then read `lib/ax.ts`'s API + `lib/claudeai.ts`'s post-session-14
-   migration shape. Confirm you understand the `waitForAxNode` /
-   `waitForAxNodes` consumer pattern.
-4. Pick ONE Category as the main bet:
-   - **D** (PRIORITY when debugger is closed): read T17's failure
-     trace; classify the failure; decide if `openPill` /
-     `clickMenuItem` migration would reach it.
-   - **A**: bundle grep + per-URL navigation + registry re-probe.
-   - **B**: pick a Tier 3 candidate, smoke-test the read-side, decide
-     ship or defer.
-   - **C**: bundle grep on rejection literals, schema-rev, smoke-test
-     the resolved shape, decide ship or defer.
+2. **Check debugger ATTACHMENT QUALITY (not just port).** First
+   `ss -tln 2>/dev/null | grep ':9229'`. If port open, also probe
+   webContents via `evalInMain` (see "Big new findings" §3 for
+   the probe shape). If every URL is `/login` /
+   `find_in_page` / `main_window`, treat as soft-blocked.
+3. **Disambiguate running Claude processes.** Required before any
+   `seedFromHost` spec. `pgrep -af "ozone-platform=x11.*app.asar"`
+   + cmdline inspection for user-data-dir.
+4. Read the plan doc's "Status (post-execution)" session 15 section,
+   then read T17's session-15 form and the seedFromHost convention.
+5. Pick the main bet:
+   - **D-verify** (PRIORITY): run T17, classify the result.
+   - **C**: bundle grep on rejection literals, schema-rev,
+     smoke-test the resolved shape against the /login isolation.
+   - **STOP**: if both above produce nothing tractable, recommend
+     stopping the orchestration.

 If Phase 0 surfaces a problem (typecheck failing, primitives unclear,
 the chosen Category's prerequisites don't hold), stop and report.
@@ -421,31 +381,24 @@ Don't fan out.

 #### Phase 1 — fan-out batch

-For Category D (further migration / T17 investigation):
- Single subagent reads T17's trace, classifies, ships the migration
-  if applicable. Verify by running T16 / T17 / T26 / H05.
-
-For Category A (operon investigation):
- Single subagent does bundle-grep for operon URL routes + per-URL
-  registry re-probe. Report findings; if a Tier 2 reframe is
-  tractable, ship one spec.
-
-For Category B (Tier 3 read-only reframes):
- Spawn ONE subagent for the candidate read-side investigation
-  (smoke-test + bundle-grep if needed).
+For Category D-verify (T17 run):
+- Single subagent (or do directly — it's a single-command run +
+  trace inspection) runs T17 and classifies. Verify by checking
+  pass/skip/fail and any new failure-mode trace.

 For Category C (schema-rev):
 - Single subagent does bundle-grep on the rejection literals,
  surfaces the validator schemas, smoke-tests the recovered shapes
-  against the user's debugger-attached running Claude.
+  against the user's debugger-attached running Claude (or /login
+  isolation if soft-blocked).

-Cap at ~1 spec OR ~1 primitive migration total — same scope as
-sessions 9-14.
+Cap at ~1 spec OR ~1 verification + 1 schema-rev — same scope as
+sessions 9-15.

 #### Per-subagent prompt shape

 ```
-You're implementing ONE [test-harness runner | primitive migration |
+You're implementing ONE [verification run | primitive migration |
 investigation] for <TARGET>.

 Read in order:
@@ -454,15 +407,11 @@ Read in order:
  the most-recent-template that fits)
 - tools/test-harness/src/runners/<closest-template>.spec.ts
 - tools/test-harness/src/lib/ (the primitives you'll reuse —
-  including session 13's `lib/ax.ts` and session 14's migration
-  examples in `lib/claudeai.ts`)
+  including session 13's `lib/ax.ts` and session 15's seedFromHost
+  T17 migration)
 - CLAUDE.md (project conventions)

-Write tools/test-harness/src/runners/<TARGET>_short_name.spec.ts
-[ AND/OR  tools/test-harness/src/lib/<NEW-PRIMITIVE>.ts
-  AND/OR  edits to tools/test-harness/src/lib/claudeai.ts ].
-
-[per-task specifics: pattern (seedFromHost / mock-then-call /
+[per-task specifics: pattern (verification run / mock-then-call /
 asar fingerprint / shared isolation / new-primitive-build /
 investigation / call-site migration), assertion shape, skip rules,
 key constraint warnings]
@@ -481,17 +430,15 @@ Constraints:
 If the target isn't reasonable to implement (anchors don't resolve
 to anything assertable, the test depends on state you can't
 construct, the existing primitives don't cover the surface), DO
-NOT write a stub. Report under Open questions and stop. Sessions
-1-14 had cumulative ~17 "stop and report" outcomes that were the
-right call.
+NOT write a stub. Report under Open questions and stop.

 Report shape (~150 words):
-## <TARGET> [runner | primitive | investigation | migration]
+## <TARGET> [verification | primitive | investigation | migration]

 - File written: tools/test-harness/src/runners/<filename>.spec.ts
  [or lib/<newfile>.ts or modified lib/<existing>.ts]
 - Layer: file probe | argv probe | L1 | L2 (xprop) | L2 (DBus) |
-  pgrep | new-primitive | investigation | migration
+  pgrep | new-primitive | investigation | migration | verification
 - Assertion shape (or migration shape): <one sentence>
 - Skip rules: <which rows + why>
 - Verification path: <typecheck + run result>
@@ -525,7 +472,7 @@ After fan-out returns:

 ### Self-correction loop

-Same as sessions 1-14:
+Same as sessions 1-15:

 1. Subagent typecheck failure → re-spawn with explicit fix
   instruction.
@@ -538,12 +485,11 @@ Same as sessions 1-14:
   examine the assertion shape.
 5. Migration breaks an existing spec → roll back the migration; the
   per-spec retry budget was load-bearing and the primitive
-   defaults didn't match. Document the budget mismatch in plan-doc.
-6. **Carry-over from session 5/6/7/8/9/10/11/12/13/14:** If the
-   chosen Category's investigation doesn't resolve / requires
-   schema-rev that exceeds budget after 2-3 approaches, STOP. Don't
-   keep digging — pivot to a fallback Category. Document what was
-   tried.
+   defaults didn't match.
+6. **Carry-over from sessions 5-15:** If the chosen Category's
+   investigation doesn't resolve / requires schema-rev that exceeds
+   budget after 2-3 approaches, STOP. Don't keep digging — pivot
+   to a fallback Category. Document what was tried.
 7. **Carry-over from session 10:** If a registration probe surfaces
   "registered but uninvocable", document and defer rather than
   building the main-side fallback speculatively.
@@ -562,29 +508,26 @@ Stop and write the final report when one of:
 3. **Discovered a primitive gap that breaks 5+ Tier 2/Tier 3
   tests.** Stop, propose where the new primitive should live in
   `lib/`. Future session adds the primitive first, then resumes.
-4. **Session budget hits ~1 new spec OR one new primitive
-   landing OR one substantive call-site migration.** Stop,
-   synthesize, leave the rest for the next session.
-5. **All categories blocked after 2-3 attempts each.** Document the
-   findings as plan-doc additions and stop — coverage is at 97%, a
-   no-spec session that surfaces deferral notes is fine.
+4. **Session budget hits ~1 verification + 1 schema-rev landing.**
+   Stop, synthesize, leave the rest for the next session.
+5. **All categories blocked / unproductive after 2-3 attempts
+   each.** Document the findings as plan-doc additions, **and
+   recommend the orchestrator stop the campaign** — coverage at
+   97%, three+ consecutive non-coverage sessions, dimming
+   productivity signal.

 ### What you should NOT do

- **Don't try to land Category D + A + B + C in one batch.** Pick
-  ONE as the main bet.
+- **Don't try to land D-verify + C in one batch.** Pick D-verify
+  first; if that resolves cleanly, take C as a stretch goal.
 - **Don't ship stubs.** If a runner can't actually assert what the
  spec says, mark it as Tier 3 / blocked / primitive-gap and
  don't write a placeholder.
 - **Don't break existing runners.** H01-H05 are the canaries.
-  T17 / T07 / S25 / S29-S31 are pre-existing-flaky on KDE-W
-  per session 13's full-suite run (T16 fixed by session 14) —
-  those are NOT canaries.
 - **Don't restructure `lib/`** beyond targeted additions.
  Premature abstractions are wrong abstractions.
 - **Don't run destructive Tier 3 tests** that write to the user's
-  real claude.ai account (T22 PR write, T27 scheduling write, T29
-  worktree creation, T34 OAuth, T36 hooks-fire-on-prompt-submit).
+  real claude.ai account.
 - **Don't introspect `ipcMain._invokeHandlers` for `claude.web`
  eipc channels.** Use `lib/eipc.ts`.
 - **Don't call `invokeEipcChannel` for write-side handlers.**
@@ -602,25 +545,28 @@ Stop and write the final report when one of:
 - **Don't add a `waitForRenderedSurface(client, surfaceKey)`
  registry to `lib/ax.ts`.** Session 13 deliberately deferred
  this — wait for a third consumer with a specific named surface.
- **Don't change the existing per-spec retry budgets when migrating
-  to `waitForAxNode`.** The budgets are tuned. Migration is shape-
-  only — except when the call-site has NO retry at all (the
-  session-14-authorized bug-fix shape).
+- **Don't migrate `openPill` / `clickMenuItem` to `waitForAxNode`
+  speculatively.** Session 15 confirmed T17's flake didn't need
+  it; without a third consumer signal, it's premature optimisation.
 - **Don't reach into `explore/walker.ts` for AX types/helpers.**
-  `lib/ax.ts` re-exports `RawElement` / `AxNode` /
-  `axTreeToSnapshot` / `waitForAxTreeStable` — use those.
+  `lib/ax.ts` re-exports — use those.
 - **Don't implement the #569 power-inhibit patch in this
  session.** That's a separate workstream.
+- **Don't keep cycling on documentation-only sessions.** If
+  D-verify and C both turn up empty, formally recommend the
+  orchestrator stop the campaign rather than burning another
+  session of compute on marginal output.

 ### Final report format

 ```markdown
-## Runner implementation summary (session 15)
+## Runner implementation summary (session 16)

- Main-bet category: D | A | B | C
+- Main-bet category: D-verify | C | STOP
 - Specs landed: N
 - Migrations completed: N
 - Primitives landed: N
+- Verifications run: N
 - Reclassified mid-flight: N (with reasons)
 - Coverage: was 74/76 (97%), now <NEW>/76 (<PCT>%)
 - Typecheck: clean | <errors>
@@ -630,7 +576,7 @@ Stop and write the final report when one of:

 | Cat | Test ID | File | Assertion shape | Status |
 |---|---|---|---|---|
-| D | <call-site> | <file>.ts | … | ✓ pass / skip / fail |
+| D-verify | T17 | T17_folder_picker.spec.ts | … | ✓ pass / skip / fail |
 | ... |

 ## Notable findings
@@ -639,6 +585,9 @@ Stop and write the final report when one of:
 ## Open questions
 - ...

+## Stop recommendation
+- Yes / no, with rationale.
+
 ## Files touched
 git status output.

@@ -659,12 +608,8 @@ git diff --stat
  Connects to a debugger-attached running Claude on port 9229.
 - For seedFromHost specs, the host MUST have a signed-in Claude
  Desktop. The primitive throws with a clear message if not.
- For tests that touch the AX tree, **`lib/ax.ts`** is the new
-  shared substrate. `claudeai.ts` page-objects are still the
-  right substrate for renderer-UI domain operations (CodeTab,
-  compact pills, menu items) — they consume `lib/ax.ts`
-  internally. Don't query DOM by CSS selector unless `claudeai.ts`
-  doesn't already cover the surface.
+- For tests that touch the AX tree, **`lib/ax.ts`** is the shared
+  substrate.
 - For mock-then-call: helpers live in `lib/electron-mocks.ts`.
 - For focus-shifting (X11 only): `lib/input.ts` exports
  `focusOtherWindow` + `spawnMarkerWindow`.
@@ -685,14 +630,13 @@ git diff --stat
  finding):** invoke a read-side from each impl object.
 - **For AX-tree polling (session 13 finding):** `lib/ax.ts`'s
  `waitForAxNode` / `waitForAxNodes` for predicate-based polling.
-  `snapshotAx` for one-shot reads. Re-exports keep
-  `explore/walker.ts` types accessible without crossing the
-  lib/explore boundary.
 - **For call-site migrations to `waitForAxNode` (session 14
  finding):** keep per-spec retry budgets matching the existing
-  tuning. Migration is shape-only EXCEPT when the call-site had
-  NO retry at all — adding a budget is the bug-fix the migration
-  delivers.
+  tuning.
+- **For auth-required spec migrations (session 15 finding):**
+  use `seedFromHost: true`, NOT `CLAUDE_TEST_USE_HOST_CONFIG=1` /
+  `isolation: null`. The legacy shape collides with Playwright's
+  60s spec timeout.
 - **For asar fingerprints: ALWAYS grep the installed asar
  first.** Build-reference is beautified; the bundle is
  minified.
--- a/docs/testing/runner-implementation-plan.md
+++ b/docs/testing/runner-implementation-plan.md
@@ -18,6 +18,135 @@ work begins.

 ## Status (post-execution)

+**Shipped session 15 (1 structural fix, no new spec, no AX migration):**
+T17 migrated from the legacy `CLAUDE_TEST_USE_HOST_CONFIG=1` /
+`isolation: null` auth path to the canonical `seedFromHost: true`
+pattern (mirroring T16 / T26). Phase 0 calibration found port 9229
+listening BUT the attached process was a leaked test isolation
+(claude.ai loaded at `/login`, NOT the user's auth-bearing Claude),
+which made Categories A (operon-mode probe) / B (Tier 3 read-only
+reframes) / C (schema-rev) all soft-blocked: the debugger was
+technically attached, but to the wrong process for any auth-required
+investigation. Session 15 pivoted to investigating T17's pre-existing
+flake (the PRIORITY directive in the followup) and discovered the
+failure was structural rather than AX-polling-related.
+
+**T17 flake root cause (session 15 finding):** The trace shows a
+bare 60s Playwright spec timeout with NO `renderer-url` attachment
+fired. That attachment lives at line 49 of the pre-migration spec —
+which means the test never reached line 40's `waitForReady(
+'userLoaded')` resolution. Session 14's hypothesis that T17's flake
+was an `openPill` / `clickMenuItem` issue was wrong: the failure is
+upstream of the AX click chain. The spec was running with
+`isolation: undefined` (the no-`CLAUDE_TEST_USE_HOST_CONFIG` branch),
+which produces a fresh isolation with no auth tokens, claude.ai
+redirects to `/login`, and `waitForUserLoaded` polls for its full 90s
+budget — but Playwright's spec timeout is 60s (per
+`playwright.config.ts`). The 30s incompatibility produces the bare
+"Test timeout of 60000ms exceeded" with no test-body trace events.
+The fix is to align T17 with T16 / T26's shape: `seedFromHost: true`
+copies the host's auth into the per-test isolation, hits a clean
+`postLoginUrl` resolution, and skips cleanly when no signed-in host is
+available (rather than hanging until the spec timeout preempts).
+
+Coverage stays at 74/76 (97%) — structural fix, no spec landed. The
+matrix coverage doesn't reflect spec-shape migrations; this shows up
+as a real productivity gain (T17 should now succeed when host is
+signed in, rather than auto-failing with a 60s timeout regardless).
+
+Two commits on `docs/compat-matrix` expected (the orchestration
+directive supersedes "the user reviews and commits" — autonomous
+commit + push at end of session):
+
+- TBD — `test(harness): session 15 migrate T17 to seedFromHost +
+  prune unused RawElement import (no spec, coverage unchanged at
+  97%)` (T17 spec rewrite swapping the `CLAUDE_TEST_USE_HOST_CONFIG`
+  + `isolation: null` branch for the canonical `seedFromHost: true`
+  pattern; prunes unused `RawElement` re-export import in
+  `lib/claudeai.ts` per session 14's leftover hint; typecheck clean;
+  T17 not run this session because the dev box's running processes
+  ambiguously include leaked test isolations and possibly the user's
+  real Claude — `seedFromHost` would kill both, deferred to next
+  session for verification).
+- TBD — `docs(testing): session 15 plan/inventory + rotate session 16
+  prompt`.
+
+Session 15 findings + reclassifications:
+
+- **T17 flake reclassified from "AX-polling tuning" to "auth path
+  not seeded".** Session 14's followup hypothesised the flake lived
+  in `openPill` / `clickMenuItem` post-click loops; the trace
+  evidence rules that out. The Playwright spec timeout (60s) is
+  shorter than `waitForReady('userLoaded')`'s default budget (90s),
+  so any unauth'd test that polls userLoaded will fail with a bare
+  timeout regardless of what the AX code does. T17 was the last
+  spec on the legacy `CLAUDE_TEST_USE_HOST_CONFIG=1` / `isolation:
+  null` shape — every other auth-required spec (T07, T16, T19,
+  T20, T21, T22b, T26, T27, T31b, T33b/c, T35b, T37b, T38b) had
+  already moved to `seedFromHost: true`. T17 was an outlier, and
+  the outlier-ness was the flake.
+- **`openPill` / `clickMenuItem` migration NOT shipped.** Session
+  14's followup proposed migrating these to `waitForAxNode` /
+  `waitForAxNodes`. With T17's actual failure mode resolved by
+  the structural fix, there's no remaining flake-evidence pulling
+  for that migration. `openPill`'s while-loop and
+  `clickMenuItem`'s while-loop both work fine when the auth path
+  is correct; speculatively migrating them now would be premature
+  optimisation. Future sessions can take it if a third consumer
+  surfaces with budget-tuning evidence.
+- **Unused `RawElement` import pruned.** Session 14 left
+  `import type { RawElement }` in `lib/claudeai.ts`'s
+  destructured `./ax.js` import after the migration didn't end up
+  needing the type re-export. Pruned in session 15 alongside the
+  T17 migration (one commit, two related shape fixes).
+- **Debugger-attached process is a leaked test isolation.** The
+  port-9229 listener pointed at a process whose webContents listed
+  three URLs: `find_in_page.html`, `https://claude.ai/login`, and
+  `main_window/index.html`. NOT the user's signed-in Claude. The
+  user-data-dir on those processes was `/tmp/claude-test-*`,
+  confirming they're leaked from prior test runs. There are
+  multiple `/tmp/claude-test-*` dirs accumulating on the dev box
+  (visible via `ls /tmp/`). Future sessions: Phase 0 calibration
+  should distinguish "port 9229 is open" from "port 9229 is open
+  AND attached to the user's auth-bearing Claude". Probe via
+  `evalInMain` listing webContents — if every URL is `/login`,
+  the auth-required investigations (Categories A/B/C) are blocked
+  same as if the debugger were closed.
+- **No primitive change, no AX migration.** `lib/ax.ts` and the
+  session 14 migration shape are unchanged. The change was a
+  spec-level structural fix, not a substrate or page-object
+  change.
+
+Tier 2 → Tier 2 candidates remaining for next session: same as
+sessions 12-14 — operon-mode navigation probe (still needs an
+auth-bearing debugger-attached Claude), schema-rev for
+`listRemotePluginsPage` / `listSkillFiles` (might be tractable
+against the leaked-isolation /login process since validators run
+auth-independent — investigate), Tier 3 read-only reframes
+(login-required). The `openPill` / `clickMenuItem` migration is
+parked: session 15 confirmed T17's flake didn't need it, and no
+other consumer is signalling for it. Coverage at 74/76 (97%) with
+the test budget naturally cycling through low-impact deliverables
+unless a true coverage opportunity surfaces.
+
+**Productivity signal for next session.** Session 15 fixed a
+real T17 failure mode (structural). Sessions 13-15 collectively
+have produced one new primitive (`lib/ax.ts`), one substantive
+migration (`activateTab` + `CodeTab.activate`), one structural
+fix (T17 seedFromHost). NO coverage gain in those three sessions.
+The remaining categories without a debugger that hits the user's
+signed-in process are mostly exhausted. Next session should
+prioritise (a) running T17 to verify the seedFromHost fix actually
+resolves the 60s timeout, and (b) checking whether a Category C
+schema-rev probe against the leaked /login isolation is tractable
+(validators don't need auth, only invocation does — worth a 15-min
+investigation). If both turn up empty, the orchestrator should
+seriously consider stopping — at 97% coverage with no clear
+high-leverage shapes left, further sessions are likely to produce
+documentation-only or marginal-improvement deliverables.
+
+---
+
 **Shipped session 14 (1 call-site migration, no new spec):**
 `activateTab` and `CodeTab.activate` in `lib/claudeai.ts` migrated
 from hand-rolled retry loops to session 13's `lib/ax.ts` substrate.