docs(testing): session 14 plan/inventory + rotate session 15 prompt

Add session 14 status entry to runner-implementation-plan.md (call- site migration + T16 fix verification + T17-stays-flaky verification). Rotate the followup prompt for session 15: PRIORITY shape is T17 investigation + potential `openPill` / `clickMenuItem` migration if the failure trace shows AX-polling-reachable cause; A / B / C unchanged from session 14 (still need debugger). Co-Authored-By: Claude <claude@anthropic.com>
2026-05-17 00:26:21 +03:00 · 2026-05-04 00:11:59 -04:00
parent 865c147916
commit 8b556f2997
2 changed files with 315 additions and 164 deletions
--- a/docs/testing/runner-implementation-followup-prompt.md
+++ b/docs/testing/runner-implementation-followup-prompt.md
@@ -1,127 +1,127 @@
-# test-harness runner implementation — session 14 prompt
+# test-harness runner implementation — session 15 prompt

 This file is meant to be **copied verbatim into a fresh Claude Code
 session** as the initial user message. Don't paraphrase it; the
 orchestration depends on the exact directives below.

 You're picking up after a runner-implementation session that landed 1
-new primitive (`lib/ax.ts`) and NO new spec. Session 13 was a pivot:
-Phase 0 calibration found the debugger detached on the dev box (port
-9229 not listening — Claude was running but Developer → Enable Main
-Process Debugger had not been clicked), which blocked Categories A
-(operon-mode navigation probe) and C (schema-rev for
-`listRemotePluginsPage` / `listSkillFiles`) — both need runtime
-probing against a debugger-attached running Claude. Category B (Tier
-3 read-only reframes) ALSO effectively needed the debugger for the
-smoke-test investigation phase. Session 13 pivoted to the
-PRIORITY-flagged DOM unification primitive, which was tractable
-without the debugger because both consumer signals existed
-statically: `claudeai.ts` had a private `snapshotAx`, T26 had a
-duplicate inline copy explicitly noted as "premature abstraction at 1
-consumer", plus the user reported recurring AX-query flake. Coverage
-unchanged at 74/76 (97%) — primitive-only sessions don't move the
-spec count. Two commits on `docs/compat-matrix` expected (SHAs
-inserted after the test-harness commit lands — the user reviews and
-commits at the end of every session):
+call-site migration (no new spec, no new primitive). Session 14 was
+a flake-reduction session: Phase 0 calibration found the debugger
+detached on the dev box (port 9229 not listening — Claude was not
+running, or running but Developer → Enable Main Process Debugger had
+not been clicked), which blocked Categories A (operon-mode
+navigation probe), B (Tier 3 read-only reframes), and C (schema-rev
+for `listRemotePluginsPage` / `listSkillFiles`) — all needing runtime
+probing against debugger-attached Claude. Session 14 pivoted to the
+PRIORITY Category D (call-site migration to `waitForAxNode`), which
+was tractable without the debugger because the migration is pure
+shape-only refactor against existing `lib/ax.ts` substrate. Coverage
+unchanged at 74/76 (97%) — migration sessions don't move the spec
+count, but T16's pre-existing failure mode (`no AX-tree button with
+accessibleName="Code" found`) is fixed by the migration. Two commits
+on `docs/compat-matrix` expected (autonomous orchestration commits +
+pushes — the user reviews after the session):

- TBD — `test(harness): session 13 lib/ax.ts AX substrate primitive`
-  (extracts `snapshotAx` from `claudeai.ts` private + T26 inlined
-  duplicate; adds `waitForAxNode` / `waitForAxNodes` predicate-based
-  polling helpers; re-exports `RawElement` / `AxNode` /
-  `axTreeToSnapshot` / `waitForAxTreeStable` from `explore/walker.ts`
-  so consumers stay inside `lib/`; refactors `claudeai.ts` and T26
-  to consume the shared substrate).
+- TBD — `test(harness): session 14 migrate activateTab to
+  waitForAxNode (no spec, coverage unchanged at 97%)`
+  (migrates `activateTab` from one-shot snapshot to `waitForAxNode`
+  with a configurable pre-click timeout; migrates
+  `CodeTab.activate`'s post-click `retryUntil`-around-
+  `findCompactPills` loop to `waitForAxNodes`; T16 passes 3/3 on
+  KDE-W against the migrated form, was pre-existing-flaky on the
+  baseline; T26 passes; T17 still pre-existing-flaky — verified by
+  stash + retry).

 The plan doc at
 [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
 captures the tier classification and execution-time reclassifications.
 Its "Status (post-execution)" section is the source of truth for
-what's done and what's deferred — read **session 13** first, then
-**session 12**, then **session 11**, then **session 10**, then
-**session 9**, then **session 8**, then **session 7**, then **session
-6**, then **session 5**, then **session 4**, then **session 3**, then
-**session 2**, then **session 1** sub-sections.
+what's done and what's deferred — read **session 14** first, then
+**session 13**, then **session 12**, then **session 11**, then
+**session 10**, then **session 9**, then **session 8**, then **session
+7**, then **session 6**, then **session 5**, then **session 4**, then
+**session 3**, then **session 2**, then **session 1** sub-sections.

 This session is a continuation, not a restart. Start by reading the
 plan doc's status sections.

-### Big new findings from session 13
+### Big new findings from session 14

-1. **Pre-existing T16 / T17 / T07 / S25 / S29-S31 flake confirmed
-   on KDE-W against the unchanged baseline.** Running the full suite
-   surfaced 12 failures, including T16 (CodeTab.activate: no AX-tree
-   button with accessibleName="Code" found) and T17. Verified
-   pre-existing by stashing the session-13 changes and re-running
-   T16 — same failure. Session 13's primitive doesn't fix the existing
-   flake; it lays groundwork. Future sessions can build flake-
-   reduction patches against `lib/ax.ts`'s `waitForAxNode` (e.g.
-   promote `activateTab`'s one-shot snapshot to a proper retry, or
-   give T07's CSS-querySelector poll a more durable wait shape if
-   that abstraction emerges).
-2. **`lib/ax.ts` is the new shared AX-tree substrate.** Surface:
-   - `snapshotAx(inspector, opts)` — single AX read with the
-     stability gate. `opts.fast` skips the gate for inside-poll
-     callers (matches the existing `claudeai.ts`/T26 contract).
-   - `waitForAxNode(inspector, predicate, opts)` — repeatedly
-     snapshot the tree and return the first matching `RawElement`,
-     null on timeout. Gates on stability once at the start
-     (configurable), then iterates with `fast: true`. Built against
-     the inline polling loops in `CodeTab.activate`, `openPill`,
-     `clickMenuItem`, T26 pre/post-click anchor scans — but the
-     existing call-sites are NOT migrated this session (their per-
-     spec retry budgets are tuned and changing them speculatively
-     risks flake). Future call-site migrations are tractable.
-   - `waitForAxNodes(inspector, predicate, opts)` — same shape,
-     returns every match. For consumers that want to enumerate.
-   - Re-exports: `RawElement`, `AxNode`, `axTreeToSnapshot`,
-     `waitForAxTreeStable` — so consumers stay inside `lib/`
-     instead of reaching into `explore/walker.ts` directly.
-3. **The debugger-attachment precondition is binding.** Sessions 9
-   through 12 did extensive runtime probing of the per-wc IPC
+1. **`activateTab` no-retry was the T16 failure mode.** Verified by
+   stashing the migration and re-running T16 against the baseline —
+   same `CodeTab.activate: no AX-tree button with accessibleName="Code"
+   found` failure. The migration converts the pre-click snapshot from
+   one-shot to a `waitForAxNode` poll, with the existing T16 budget
+   (15s through `CodeTab.activate({ timeout })`) covering both the
+   pre-click click-budget and the post-click pill poll. T16 passed
+   3/3 in succession against the migrated form. Strong signal that
+   "convert one-shot AX snapshots to `waitForAxNode` polling" is a
+   high-leverage flake-reduction shape — this is the first migration
+   that demonstrably fixed an existing failure.
+2. **T17 stays pre-existing-flaky.** T17 exercises the env-pill →
+   Local → Select-folder → Open-folder chain via `openEnvPill` /
+   `selectLocal` / `openFolderPicker`, which use `openPill` and
+   `clickMenuItem` internally. Those weren't migrated this session
+   (their post-click stability gates plus per-spec sleep budgets
+   carry tuning the prompt explicitly cautioned against changing).
+   T17's flake mode is unchanged-by-migration; future sessions can
+   take it if budget tuning data warrants. The `openPill` while-loop
+   on a successful menu render takes 100ms-per-poll-iteration; if the
+   menu hasn't rendered within 5s, it returns `{ opened: false,
+   items: [] }`. Migrating to `waitForAxNode` would flatten the loop
+   shape but doesn't obviously change the outcome, so the migration
+   wasn't worth the budget-tuning risk this session.
+3. **The debugger-attachment precondition is still binding.**
+   Sessions 9-12 did extensive runtime probing of the per-wc IPC
   registry against the user's debugger-attached Claude. Without
   that probing, Categories A / B / C in this prompt are blocked at
   the smoke-test phase. If the user hasn't clicked Developer →
   Enable Main Process Debugger before the session starts, port 9229
   is closed and the categories pivot to either documentation work
-   or the call-site-migration shape that doesn't need runtime
-   probing. Phase 0 must check `ss -tln | grep ':9229'` (or `curl
-   --max-time 2 http://127.0.0.1:9229/json`) before fanning out.
+   or further call-site migration. Phase 0 must check `ss -tln |
+   grep ':9229'` (or `curl --max-time 2 http://127.0.0.1:9229/json`)
+   before fanning out.
 4. **The reframe pool remains essentially exhausted.** Same status
-   as session 12 — every Tier 1 fingerprint with a tractable runtime
-   sibling has been promoted. The remaining options are now: (a)
-   call-site migration to `waitForAxNode` for flake reduction, (b)
-   operon-mode navigation probe (still needs debugger), (c) schema-
-   rev for `listRemotePluginsPage` / `listSkillFiles` (still needs
-   debugger), (d) Tier 3 read-only reframes (most need user-account
-   state). The natural next-session shape is (a) — flake reduction
-   builds on session 13's primitive and doesn't need the debugger.
+   as sessions 12-13 — every Tier 1 fingerprint with a tractable
+   runtime sibling has been promoted. The remaining options are now:
+   (a) further call-site migration to `waitForAxNode` for flake
+   reduction (`openPill` / `clickMenuItem` / T26's pre-click
+   `retryUntil` — though T26's needs a `context-was-destroyed`
+   exception swallow), (b) operon-mode navigation probe (still needs
+   debugger), (c) schema-rev for `listRemotePluginsPage` /
+   `listSkillFiles` (still needs debugger), (d) Tier 3 read-only
+   reframes (most need user-account state). Session 14 demonstrated
+   migration can deliver a measurable bug-fix outcome; that
+   continues to be the highest-leverage shape when the debugger is
+   closed.

 ### Authoritative reference

 Read these in order before fanning out:

 - [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
-  — tier classification + status section. Read **session 13**, then
-  **session 12**, then **session 11**, **session 10**, **session 9**,
-  **session 8**, **session 7**, **session 6**, **session 5**, **session
-  4**, **session 3**, **session 2**, then **session 1** "Status (post-
-  execution)" sub-sections. The Tier-3 list (search for "## Tier 3")
-  is the candidate pool for any further reframes.
+  — tier classification + status section. Read **session 14**, then
+  **session 13**, **session 12**, **session 11**, **session 10**,
+  **session 9**, **session 8**, **session 7**, **session 6**,
+  **session 5**, **session 4**, **session 3**, **session 2**, then
+  **session 1** "Status (post-execution)" sub-sections. The Tier-3
+  list (search for "## Tier 3") is the candidate pool for any further
+  reframes.
 - [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
  — runner conventions, the now-74-spec inventory, primitives in
  `lib/`, isolation defaults, the CDP-gate workaround, the eipc
-  note, and the new `lib/ax.ts` substrate (session 13 addition;
-  consumer list is `claudeai.ts` page-objects + T26).
+  note, and `lib/ax.ts` substrate (session 13 addition; session 14
+  migrated `activateTab` + `CodeTab.activate`'s post-click pill
+  poll to use it).
 - [`docs/testing/cases/README.md`](cases/README.md) — case-doc
  structure and the four anchor scopes.
 - [`tools/test-harness/src/lib/`](../../tools/test-harness/src/lib/)
-  — the existing primitives. Session 13 added `lib/ax.ts`; surface
-  is `snapshotAx` / `waitForAxNode` / `waitForAxNodes` plus re-
-  exports of `RawElement` / `AxNode` / `axTreeToSnapshot` /
-  `waitForAxTreeStable`. The session 8 eipc surface
-  (`getEipcChannels` / `findEipcChannel` / `findEipcChannels` /
-  `waitForEipcChannel` / `waitForEipcChannels` / `invokeEipcChannel`
-  on `lib/eipc.ts`) is unchanged.
+  — the existing primitives. `lib/ax.ts` surface is `snapshotAx` /
+  `waitForAxNode` / `waitForAxNodes` plus re-exports. The session 8
+  eipc surface (`getEipcChannels` / `findEipcChannel` /
+  `findEipcChannels` / `waitForEipcChannel` /
+  `waitForEipcChannels` / `invokeEipcChannel` on `lib/eipc.ts`) is
+  unchanged.
 - [`tools/test-harness/eipc-registry-probe.ts`](../../tools/test-harness/eipc-registry-probe.ts)
  — the session 7 read-only registry probe. Re-run against a
  debugger-attached Claude (`Developer → Enable Main Process
@@ -131,13 +131,16 @@ Read these in order before fanning out:
  and run N candidate read-sides through M arg shapes; deleted
  after.
 - [`tools/test-harness/src/runners/`](../../tools/test-harness/src/runners/)
-  — every existing spec is a template. Notable session 13
+  — every existing spec is a template. Notable session 14
  candidates for follow-up:
-  - `T26_routines_page_renders.spec.ts` — first consumer of
-    `lib/ax.ts`'s exported `snapshotAx` (refactored from inline).
-    Other AX-using specs (T16, T17, H05) still call through
-    `claudeai.ts` page-objects which use the shared substrate
-    transparently.
+  - `T17_folder_picker.spec.ts` — the next test that would benefit
+    from `openPill` / `clickMenuItem` migration. Pre-existing
+    flake; current failure is a 60s timeout in the
+    openEnvPill/selectLocal/openFolderPicker chain.
+  - `T26_routines_page_renders.spec.ts` — has a pre-click
+    `retryUntil` block with `context-was-destroyed` exception
+    handling that could become a `waitForAxNode` call once the
+    primitive grows error-class options.
 - [`docs/testing/cases/*.md`](cases/) — the spec each runner
  asserts. The **Code anchors:** field tells you exactly where
  upstream implements the feature.
@@ -146,40 +149,47 @@ Read these in order before fanning out:

 **Realistic ceiling: ~1 new spec OR one substantive flake-reduction
 deliverable OR one investigation.** Sessions 9-12 each landed 1-2
-specs; session 13 landed only a primitive (debugger blocked).
-Coverage at 74/76 means the test budget naturally shifts toward
-either (a) flake reduction against `lib/ax.ts`'s primitive, (b)
-investigation that requires the debugger and was deferred from
-sessions 12-13, or (c) Tier 3 read-only reframes that the harness
-can construct from existing `seedFromHost` state.
+specs; session 13 landed only a primitive (debugger blocked); session
+14 landed only a migration (debugger blocked). Coverage at 74/76
+means the test budget naturally shifts toward either (a) further flake
+reduction by extending the migration shape, (b) investigation that
+requires the debugger and was deferred from sessions 12-14, or (c)
+Tier 3 read-only reframes that the harness can construct from
+existing `seedFromHost` state.

 **Phase 0 MUST check the debugger BEFORE picking a category.** Run
 `ss -tln 2>/dev/null | grep ':9229'` (or
 `curl --max-time 2 http://127.0.0.1:9229/json`). If port 9229 is not
 listening, Categories A and C are hard-blocked. Pivot to D or B.

-#### **PRIORITY: Call-site migration to `lib/ax.ts`'s
-`waitForAxNode` for flake reduction.** Session 13 landed the
-substrate; this session can promote the inline retry loops in
-`claudeai.ts` (`activateTab` is the strongest candidate — it does a
-one-shot snapshot with no retry, which is exactly the failure mode
-T16 hits). Smaller-scope candidates: `findCompactPills` (one-shot
-snapshot, no retry — same shape as `activateTab`), `openPill`'s
-post-click while-loop, `clickMenuItem`'s while-loop. Each migration
-is a localized refactor; verify by running the affected specs
-(T16/T17/T26/H05) and checking pass rate. Don't speculatively
-change the budget defaults — match the existing per-spec retry
-budgets so the migration is shape-only. **If this is what session
-14 ships, that's a strictly higher-impact outcome than another Tier
-2 / Tier 3 reframe — flake reduction touches every existing AX-
-using spec.** Doesn't need the debugger.
+#### **PRIORITY: Investigate why T17 stays flaky and decide on a
+migration-or-fix path.** Session 14's migration fixed T16's pre-
+existing failure mode. T17 is the next-clearest pre-existing-flaky
+spec on KDE-W; it shares plumbing with T16 (`CodeTab` → AX-driven
+clicks) but goes deeper through `openEnvPill` / `selectLocal` /
+`openFolderPicker`. The session 14 migration does NOT reach into
+those (they use `openPill` + `clickMenuItem`, both of which carry
+post-click stability gates and per-iteration sleep loops). The
+investigation: (1) read T17's failure trace from the most recent
+session-14 stashed run (under `tools/test-harness/results/local/
+test-output/T17_folder_picker-T17-—-Folder-picker-opens/`), (2)
+classify the failure (env-pill probe? Local item? Select-folder
+pill? Open-folder click?), (3) decide if (a) `openPill` migration
+to `waitForAxNode` would reach it, or (b) the budget defaults need
+tuning, or (c) the failure is from something orthogonal to AX
+polling. If (a), ship the migration. If (b), document the budget
+mismatch in plan-doc. If (c), defer to a future session with a
+clearer signal. **If this is what session 15 ships, that's a
+strictly higher-impact outcome than another Tier 2 / Tier 3 reframe
+— flake reduction touches every existing AX-using spec.** Doesn't
+need the debugger.

 Three categories — pick ONE as the main bet, treat the others as
 fallback if the main bet hits an early blocker:

 | # | Tests | Source | Notes |
 |---|---|---|---|
-| **D** call-site migration to `waitForAxNode` | `claudeai.ts` page-objects + T26 + future Code-tab AX work | `lib/ax.ts` (session 13 primitive) | The PRIORITY shape this session. Promote `activateTab`'s one-shot snapshot to use `waitForAxNode`; same for `findCompactPills`. Validate by re-running T16 / T17 / T26 / H05 against the migrated form. Doesn't need the debugger. Risk: changing the retry shape can introduce new flake if the budget defaults don't match the existing per-spec tuning — keep migrations shape-only, no budget changes. |
+| **D** further call-site migration / T17 investigation | T17 / `claudeai.ts` `openPill` + `clickMenuItem` | `lib/ax.ts` (session 13 primitive) | The PRIORITY shape this session. Read T17's failure trace, decide if `openPill` migration would fix it, ship the migration if so. Same shape-only refactor risk as session 14: keep the per-spec retry budgets matching the existing tuning. Doesn't need the debugger. **Risk:** `openPill` and `clickMenuItem` carry post-click stability gates that `waitForAxNode` already covers via `stabilityGate: true`, so the migration shape should slot in cleanly — but each spec's overall budget needs verification. |
 | **A** operon-mode navigation probe | n/a (investigation) + maybe small Tier 2 reframe | new probe + bundle grep for operon URL routes | Session 10 confirmed `OperonBootstrap.ensure` registers eagerly but the other 21 wrapper-exposed operon interfaces remain registry-unconfirmed. Outputs: either an operon-mode URL form recovered from the bundle (search for `operon`-keyed routes in `claude.ai/...` paths) plus a registry re-probe after navigation, OR a deferral note explaining why operon scope can't be reached without an operon-mode entry. **Needs debugger-attached Claude on port 9229.** |
 | **B** Tier 3 read-only reframes | Pick from the Tier 3 list | T33c / T35b / T37b template + bundle grep | The Tier 3 list is full of login-required flows; some have read-only entry points that the harness CAN construct. Candidates: T22's `getPrChecks` read-side might accept a non-existent PR number / dry-run mode; T15's OAuth surface has read-only state queries. Most need the user-account-scoped state to fail-fast with a clean error rather than a real network roundtrip — investigate first. **Needs debugger for smoke-test verification.** |
 | **C** Schema-rev for `listRemotePluginsPage` / `listSkillFiles` | Bundle grep | session 9 schema-rev pattern | Both methods rejected every smoke-tested arg shape during session 12's investigation. `listRemotePluginsPage` needs `limit: number` at position 0 (rejection: `Argument "limit" at position 0 ...`); `listSkillFiles` needs both `pluginId` and `skillName` (rejection: `Argument "skillName" at position 1 ...`). Bundle-grep on the rejection literals → resolve the schema → ship a narrowly-scoped Tier 2 invocation if it unblocks a case-doc claim. **Needs debugger to verify the recovered schema.** |
@@ -189,27 +199,35 @@ only session that audits the existing AX call-sites and proposes a
 migration plan (without shipping) is also acceptable — pre-work for
 a future session that DOES land the migration.

-#### Category D — call-site migration to `waitForAxNode`
+#### Category D — further call-site migration / T17 investigation

-The plan: promote inline AX retry loops in `claudeai.ts` to use
-`waitForAxNode` from `lib/ax.ts`.
+The plan: investigate T17's pre-existing flake, decide on a fix path,
+ship if a `waitForAxNode`-shaped migration of `openPill` /
+`clickMenuItem` would reach it.

-1. **Audit the call-sites.** `activateTab` does one-shot snapshot,
-   no retry — direct candidate. `findCompactPills` same. `openPill`
-   post-click while-loop and `clickMenuItem` while-loop both do
-   snapshot+filter+sleep — convert to `waitForAxNode` /
-   `waitForAxNodes` with the existing budget. T26's pre/post-click
-   `retryUntil` blocks are also direct candidates.
-2. **Migrate one call-site at a time.** Run the affected specs after
-   each migration (T16 / T17 / T26 / H05). Don't migrate all at
-   once — one bad budget change can cascade across multiple specs.
-3. **Don't change the retry budgets.** The existing per-spec timeouts
-   are tuned (CodeTab.activate uses 5s default but T16 passes 15s);
-   match them when migrating.
-4. **Don't add new functionality.** This is a shape-only refactor.
-   If a migration reveals a budget that's clearly wrong (e.g.
-   `activateTab` has NO retry today, which is the T16 failure mode),
-   that's a small bug-fix the migration corrects — but document it.
+1. **Read T17's most recent failure trace.** Either the session-14
+   stashed-baseline trace (under `tools/test-harness/results/local/
+   test-output/T17_folder_picker-T17-—-Folder-picker-opens/`) or run
+   T17 fresh against the post-session-14 form. Classify the failure:
+   - openEnvPill timeout? (would suggest `openPill` migration)
+   - selectLocal timeout? (would suggest `clickMenuItem` migration)
+   - openFolderPicker chain timeout? (suggests deeper issue)
+   - Some other failure?
+2. **If `openPill` migration would reach the failure**, migrate it.
+   The shape: replace the post-click while-loop with
+   `waitForAxNodes` filtered to MENU_ITEM_ROLES, with the existing
+   `timeout` parameter as `timeoutMs`. Keep the upfront
+   `waitForAxTreeStable` gate or pass `stabilityGate: true` to
+   `waitForAxNodes`. Verify with T17 (or the originally-affected
+   spec).
+3. **If `clickMenuItem` migration would reach the failure**, same
+   shape. Replace the while-loop with `waitForAxNode` filtered on
+   role + textPattern, with the existing `timeout` as `timeoutMs`.
+4. **If the failure is orthogonal to AX polling** (e.g. environmental,
+   timing race outside the AX surface, dialog mock not installing),
+   document and defer.
+
+Doesn't need the debugger.

 #### Category A — operon-mode navigation probe

@@ -275,11 +293,11 @@ consumer.

 #### Main-side `invokeEipcChannel` fallback (NOT recommended this session)

-Same status as sessions 8-13 — wait for a real consumer.
+Same status as sessions 8-14 — wait for a real consumer.

 #### Launch event-subscription primitive (NOT recommended this session)

-Same status as sessions 11-13 — wait for a real consumer.
+Same status as sessions 11-14 — wait for a real consumer.

 #### `waitForRenderedSurface` registry (NOT recommended this session)

@@ -295,7 +313,7 @@ not AX). Wait for a second consumer before extracting.

 ### Constraints to respect (don't violate)

-These are unchanged from sessions 1-13 and still load-bearing:
+These are unchanged from sessions 1-14 and still load-bearing:

 - **Default isolation** unless the spec needs otherwise. Use
  `seedFromHost: true` for any test that depends on authenticated
@@ -327,7 +345,18 @@ These are unchanged from sessions 1-13 and still load-bearing:
  `snapshotAx` for one-shot reads, `waitForAxNode` /
  `waitForAxNodes` for predicate-based polling. Don't reach into
  `explore/walker.ts` directly — re-exports go through `lib/ax.ts`.
-  Consumers in session 13: `lib/claudeai.ts` page-objects + T26.
+  Consumers in session 14: `lib/claudeai.ts`'s `activateTab` +
+  `CodeTab.activate` post-click pill poll (migrated from one-shot
+  / hand-rolled retryUntil), plus T26.
+- **For call-site migrations to `waitForAxNode`: keep the per-spec
+  retry budgets matching the existing tuning.** Session 14
+  finding. The defaults in `lib/ax.ts` (`timeoutMs: 5000`,
+  `intervalMs: 200`) are reasonable starting values, but any
+  caller with a known per-spec budget should pass it through. The
+  one acceptable bug-fix during migration is when the existing
+  call-site had NO retry at all (e.g. `activateTab`'s pre-click
+  one-shot snapshot) — adding a budget is the fix the migration
+  delivers, and the prompt explicitly authorized it.
 - **`lib/input.ts` is X11-only.** Strict gate.
 - **`lib/input-niri.ts` is Niri-only.** Strict gate.
 - **Don't speculate on `lib/input-wayland.ts` dispatcher.**
@@ -351,9 +380,10 @@ These are unchanged from sessions 1-13 and still load-bearing:
 - **Tabs in TS, ~80-char wrap as the existing files do.**
 - **Don't break existing runners.** `npm run typecheck` must stay
  clean. H01-H05 are the canaries; `npm test` must still pass them
-  after every commit. Note that T16/T17/T07/S25/S29-S31/S04 etc.
+  after every commit. Note that T17/T07/S25/S29-S31/S04 etc.
  are pre-existing-flaky on KDE-W per session 13's full-suite run
-  — they're NOT canaries; baseline failures don't block work.
+  (T16 fixed by session 14) — they're NOT canaries; baseline
+  failures don't block work.
 - **Always grep the installed asar** to verify a fingerprint
  string is present.
 - **For mock-then-call: the helper goes in
@@ -371,13 +401,14 @@ These are unchanged from sessions 1-13 and still load-bearing:
   `curl --max-time 2 http://127.0.0.1:9229/json`). If port 9229 is
   open, A / B / C are tractable; if closed, pivot to D or
   documentation-only.
-3. Read the plan doc's "Status (post-execution)" session 13 section,
-   then read `lib/ax.ts`'s API + `T26` and `claudeai.ts`'s
-   integration. Confirm you understand the `snapshotAx` /
-   `waitForAxNode` / `waitForAxNodes` surface.
+3. Read the plan doc's "Status (post-execution)" session 14 section,
+   then read `lib/ax.ts`'s API + `lib/claudeai.ts`'s post-session-14
+   migration shape. Confirm you understand the `waitForAxNode` /
+   `waitForAxNodes` consumer pattern.
 4. Pick ONE Category as the main bet:
-   - **D** (PRIORITY when debugger is closed): pick 1-2 call-sites
-     in `claudeai.ts` to migrate, list which.
+   - **D** (PRIORITY when debugger is closed): read T17's failure
+     trace; classify the failure; decide if `openPill` /
+     `clickMenuItem` migration would reach it.
   - **A**: bundle grep + per-URL navigation + registry re-probe.
   - **B**: pick a Tier 3 candidate, smoke-test the read-side, decide
     ship or defer.
@@ -390,9 +421,9 @@ Don't fan out.

 #### Phase 1 — fan-out batch

-For Category D (call-site migration):
- Single subagent migrates 1-2 call-sites in `claudeai.ts` to use
-  `waitForAxNode`. Verify by running T16 / T17 / T26 / H05.
+For Category D (further migration / T17 investigation):
+- Single subagent reads T17's trace, classifies, ships the migration
+  if applicable. Verify by running T16 / T17 / T26 / H05.

 For Category A (operon investigation):
 - Single subagent does bundle-grep for operon URL routes + per-URL
@@ -409,7 +440,7 @@ For Category C (schema-rev):
  against the user's debugger-attached running Claude.

 Cap at ~1 spec OR ~1 primitive migration total — same scope as
-sessions 9-13.
+sessions 9-14.

 #### Per-subagent prompt shape

@@ -423,11 +454,13 @@ Read in order:
  the most-recent-template that fits)
 - tools/test-harness/src/runners/<closest-template>.spec.ts
 - tools/test-harness/src/lib/ (the primitives you'll reuse —
-  including session 13's `lib/ax.ts`)
+  including session 13's `lib/ax.ts` and session 14's migration
+  examples in `lib/claudeai.ts`)
 - CLAUDE.md (project conventions)

 Write tools/test-harness/src/runners/<TARGET>_short_name.spec.ts
-[ AND/OR  tools/test-harness/src/lib/<NEW-PRIMITIVE>.ts ].
+[ AND/OR  tools/test-harness/src/lib/<NEW-PRIMITIVE>.ts
+  AND/OR  edits to tools/test-harness/src/lib/claudeai.ts ].

 [per-task specifics: pattern (seedFromHost / mock-then-call /
 asar fingerprint / shared isolation / new-primitive-build /
@@ -449,7 +482,7 @@ If the target isn't reasonable to implement (anchors don't resolve
 to anything assertable, the test depends on state you can't
 construct, the existing primitives don't cover the surface), DO
 NOT write a stub. Report under Open questions and stop. Sessions
-1-13 had cumulative ~17 "stop and report" outcomes that were the
+1-14 had cumulative ~17 "stop and report" outcomes that were the
 right call.

 Report shape (~150 words):
@@ -492,7 +525,7 @@ After fan-out returns:

 ### Self-correction loop

-Same as sessions 1-13:
+Same as sessions 1-14:

 1. Subagent typecheck failure → re-spawn with explicit fix
   instruction.
@@ -506,10 +539,11 @@ Same as sessions 1-13:
 5. Migration breaks an existing spec → roll back the migration; the
   per-spec retry budget was load-bearing and the primitive
   defaults didn't match. Document the budget mismatch in plan-doc.
-6. **Carry-over from session 5/6/7/8/9/10/11/12/13:** If the chosen
-   Category's investigation doesn't resolve / requires schema-rev
-   that exceeds budget after 2-3 approaches, STOP. Don't keep
-   digging — pivot to a fallback Category. Document what was tried.
+6. **Carry-over from session 5/6/7/8/9/10/11/12/13/14:** If the
+   chosen Category's investigation doesn't resolve / requires
+   schema-rev that exceeds budget after 2-3 approaches, STOP. Don't
+   keep digging — pivot to a fallback Category. Document what was
+   tried.
 7. **Carry-over from session 10:** If a registration probe surfaces
   "registered but uninvocable", document and defer rather than
   building the main-side fallback speculatively.
@@ -543,8 +577,9 @@ Stop and write the final report when one of:
  spec says, mark it as Tier 3 / blocked / primitive-gap and
  don't write a placeholder.
 - **Don't break existing runners.** H01-H05 are the canaries.
-  T16 / T17 / T07 / S25 / S29-S31 are pre-existing-flaky on KDE-W
-  per session 13's full-suite run — those are NOT canaries.
+  T17 / T07 / S25 / S29-S31 are pre-existing-flaky on KDE-W
+  per session 13's full-suite run (T16 fixed by session 14) —
+  those are NOT canaries.
 - **Don't restructure `lib/`** beyond targeted additions.
  Premature abstractions are wrong abstractions.
 - **Don't run destructive Tier 3 tests** that write to the user's
@@ -569,7 +604,8 @@ Stop and write the final report when one of:
  this — wait for a third consumer with a specific named surface.
 - **Don't change the existing per-spec retry budgets when migrating
  to `waitForAxNode`.** The budgets are tuned. Migration is shape-
-  only.
+  only — except when the call-site has NO retry at all (the
+  session-14-authorized bug-fix shape).
 - **Don't reach into `explore/walker.ts` for AX types/helpers.**
  `lib/ax.ts` re-exports `RawElement` / `AxNode` /
  `axTreeToSnapshot` / `waitForAxTreeStable` — use those.
@@ -579,7 +615,7 @@ Stop and write the final report when one of:
 ### Final report format

 ```markdown
-## Runner implementation summary (session 14)
+## Runner implementation summary (session 15)

 - Main-bet category: D | A | B | C
 - Specs landed: N
@@ -652,6 +688,11 @@ git diff --stat
  `snapshotAx` for one-shot reads. Re-exports keep
  `explore/walker.ts` types accessible without crossing the
  lib/explore boundary.
+- **For call-site migrations to `waitForAxNode` (session 14
+  finding):** keep per-spec retry budgets matching the existing
+  tuning. Migration is shape-only EXCEPT when the call-site had
+  NO retry at all — adding a budget is the bug-fix the migration
+  delivers.
 - **For asar fingerprints: ALWAYS grep the installed asar
  first.** Build-reference is beautified; the bundle is
  minified.
--- a/docs/testing/runner-implementation-plan.md
+++ b/docs/testing/runner-implementation-plan.md
@@ -18,6 +18,116 @@ work begins.

 ## Status (post-execution)

+**Shipped session 14 (1 call-site migration, no new spec):**
+`activateTab` and `CodeTab.activate` in `lib/claudeai.ts` migrated
+from hand-rolled retry loops to session 13's `lib/ax.ts` substrate.
+This is a flake-reduction session — the priority shape called out in
+session 13's followup as the natural next step once the substrate
+landed. Phase 0 calibration found the debugger detached on the dev
+box (port 9229 not listening), which blocked Categories A / B / C
+(operon-mode navigation probe + Tier 3 read-only reframes + schema-
+rev for `listRemotePluginsPage` / `listSkillFiles` — all needing
+runtime probing against debugger-attached Claude). The PRIORITY
+Category D (call-site migration) was the highest-impact deliverable
+that didn't require the debugger.
+
+Coverage stays at 74/76 (97%) — migration session, no spec landed.
+The matrix coverage doesn't reflect call-site migrations; those show
+up as flake-reduction in existing specs (T16's pre-existing `no
+AX-tree button with accessibleName="Code" found` failure mode is
+fixed by session 14's migration).
+
+Two commits on `docs/compat-matrix` expected (the orchestration
+directive supersedes "the user reviews and commits" — autonomous
+commit + push at end of session):
+
+- TBD — `test(harness): session 14 migrate activateTab to
+  waitForAxNode (no spec, coverage unchanged at 97%)`
+  (migrates `activateTab` from one-shot snapshot to
+  `waitForAxNode` with a configurable pre-click timeout; migrates
+  `CodeTab.activate`'s post-click `retryUntil`-around-
+  `findCompactPills` loop to `waitForAxNodes`; T16 passes 3/3 on
+  KDE-W against the migrated form, was pre-existing-flaky on the
+  baseline; T26 still passes (regression check); T17 still pre-
+  existing-flaky (verified by stash + retry — failure shape
+  unchanged-by-migration).
+
+Session 14 findings + reclassifications:
+
+- **T16 fix landed.** Session 13 documented T16 as pre-existing-
+  flaky on KDE-W with the failure mode `CodeTab.activate: no
+  AX-tree button with accessibleName="Code" found`. Verified by
+  stashing session 13's changes and re-running T16 against the
+  baseline — same failure. Session 14's migration converts the
+  pre-click `activateTab` from a one-shot AX snapshot into a
+  `waitForAxNode` poll. The Code button is now waited-for up to
+  the caller's budget (T16 passes 15s through `CodeTab.activate`)
+  rather than checked-once. T16 passed 3/3 in succession against
+  the migrated form.
+- **`activateTab` API change is additive.** New optional `opts:
+  { timeout?: number }` parameter; default 5000ms matches the
+  `lib/ax.ts` defaults. Existing callers (just `CodeTab.activate`)
+  pass through their own timeout. No breaking shape change to
+  return type or first/second positional args.
+- **`CodeTab.activate` post-click loop migrated.** The hand-rolled
+  `retryUntil(async () => { const pills = await
+  findCompactPills(...); return pills.length > 0 ? pills : null; },
+  { timeout, interval: 200 })` block is structurally identical to
+  `waitForAxNodes` with the compact-pill predicate inlined. The
+  predicate (role: 'button' + hasPopup: 'menu' + non-empty
+  accessibleName + not a `^More options for ` row trigger) is
+  copy-pasted from `findCompactPills` to keep the page-object
+  free-standing without changing observable shape. `waitForAxNodes`
+  carries the existing 200ms interval and overall budget through
+  via `intervalMs` / `timeoutMs`.
+- **`findCompactPills` not migrated.** It's used in three call-
+  sites: (a) inside `CodeTab.activate`'s formerly-hand-rolled
+  retry — migrated; (b) T16's diagnostic capture on failure
+  (line 91, expects fail-fast / wants whatever's currently on the
+  page); (c) T16's post-activate diagnostic (already-stable, one-
+  shot-by-design). Migrating `findCompactPills` itself would push
+  unwanted retry latency into the diagnostic path, so the helper
+  stays a one-shot snapshot — only the retry shape moved into
+  `CodeTab.activate`.
+- **`openPill` / `clickMenuItem` not migrated.** Both have
+  post-click stability gates + sleep-based polling loops that
+  could in principle be `waitForAxNode`-shaped, but each carries
+  per-spec budget tuning (T17 / openFolderPicker chain uses
+  `openPill { timeout: 1500 }` and `clickMenuItem { timeout:
+  1500 }` defaults) that the prompt explicitly cautions against
+  changing speculatively. The migration was scoped to the
+  highest-impact call-site (the T16 fix) plus the cleanest shape
+  match (`CodeTab.activate`'s post-click pill poll). Future
+  sessions can take `openPill` / `clickMenuItem` if a third
+  consumer signals.
+- **T17 unchanged-by-migration.** T17 was reported pre-existing-
+  flaky on KDE-W per session 13's full-suite run. Verified that
+  status by stashing the migration and re-running T17 — same
+  60s timeout. T17 exercises the env-pill → Local → Select-folder
+  → Open-folder chain via `openEnvPill` / `selectLocal` /
+  `openFolderPicker`, which use `openPill` and `clickMenuItem`
+  internally. Those weren't migrated this session (per above), so
+  T17's flake mode is unchanged and is pre-existing rather than
+  a session-14 regression.
+- **No primitive change.** `lib/ax.ts`'s `waitForAxNode` /
+  `waitForAxNodes` cover both migration sites unchanged. No new
+  `WaitForAxNodeOptions` flags needed.
+
+Tier 2 → Tier 2 candidates remaining for next session: same as
+session 12 / 13 — operon-mode navigation probe (still needs
+debugger), schema-rev for `listRemotePluginsPage` / `listSkillFiles`
+(still needs debugger), Tier 3 read-only reframes (login-required).
+The new shape unlocked this session: **further call-site migrations**
+in `lib/claudeai.ts` — `openPill`'s post-click while-loop and
+`clickMenuItem`'s while-loop are tractable when a follow-up signal
+warrants. Plus migrating T26's pre-click `retryUntil` (carries a
+`context-was-destroyed` retry — `waitForAxNode` doesn't currently
+swallow that exception class, so it'd need a primitive extension or
+a wrapper). Coverage at 74/76 (97%) with the test budget naturally
+shifting toward flake reduction now that the substrate exists.
+
+---
+
 **Shipped session 13 (1 new primitive, no new spec):** `lib/ax.ts` —
 shared AX-tree loading + traversal substrate, threshold-driven
 extraction. The plan-doc had flagged "Unified DOM/AX loading +