docs(testing): session 13 plan/inventory + rotate session 14 prompt

- runner-implementation-plan.md: session 13 status section (lib/ax.ts primitive shipped, no new spec, coverage stays at 74/76 = 97% since primitive-only sessions don't move the spec count; Phase 0 found debugger detached on dev box which blocked Categories A/B/C; pivoted to the PRIORITY DOM unification primitive). Updated the "Primitive gaps to flag" entry — DOM/AX loading + traversal primitive moved from FLAGGED to LANDED with the consumer list and the deliberately-deferred shapes (waitForRenderedSurface registry, CSS-querySelector primitive). - README.md: lib/ax.ts entry in the substrate-primitives note; session 13 consumer list (claudeai.ts page-objects + T26). Spec count unchanged at 74. - runner-implementation-followup-prompt.md: rotated for session 14. Adds new Category D (call-site migration to waitForAxNode for flake reduction) as the PRIORITY shape — doesn't need the debugger, builds on session 13's primitive. Carries forward Categories A / B / C (still need debugger). Phase 0 must check port 9229 BEFORE picking a category. Reading order updated: session 13 first. Co-Authored-By: Claude <claude@anthropic.com>
2026-05-17 00:26:21 +03:00 · 2026-05-03 23:57:00 -04:00
parent 3d47f33ccb
commit 113329f91f
3 changed files with 454 additions and 407 deletions
--- a/docs/testing/runner-implementation-followup-prompt.md
+++ b/docs/testing/runner-implementation-followup-prompt.md
@@ -1,178 +1,215 @@
-# test-harness runner implementation — session 13 prompt
+# test-harness runner implementation — session 14 prompt

 This file is meant to be **copied verbatim into a fresh Claude Code
 session** as the initial user message. Don't paraphrase it; the
 orchestration depends on the exact directives below.

 You're picking up after a runner-implementation session that landed 1
-new spec (T11_runtime) by way of registering five install-flow
-suffixes plus invoking BOTH case-doc-anchored read-side getters across
-TWO distinct impl objects (CustomPlugins + LocalPlugins). First cross-
-impl-object dual invocation. No primitive change. Coverage 73/76 (96%)
-→ 74/76 (97%). Two commits on `docs/compat-matrix` expected (SHAs
+new primitive (`lib/ax.ts`) and NO new spec. Session 13 was a pivot:
+Phase 0 calibration found the debugger detached on the dev box (port
+9229 not listening — Claude was running but Developer → Enable Main
+Process Debugger had not been clicked), which blocked Categories A
+(operon-mode navigation probe) and C (schema-rev for
+`listRemotePluginsPage` / `listSkillFiles`) — both need runtime
+probing against a debugger-attached running Claude. Category B (Tier
+3 read-only reframes) ALSO effectively needed the debugger for the
+smoke-test investigation phase. Session 13 pivoted to the
+PRIORITY-flagged DOM unification primitive, which was tractable
+without the debugger because both consumer signals existed
+statically: `claudeai.ts` had a private `snapshotAx`, T26 had a
+duplicate inline copy explicitly noted as "premature abstraction at 1
+consumer", plus the user reported recurring AX-query flake. Coverage
+unchanged at 74/76 (97%) — primitive-only sessions don't move the
+spec count. Two commits on `docs/compat-matrix` expected (SHAs
 inserted after the test-harness commit lands — the user reviews and
 commits at the end of every session):

- TBD — `test(harness): session 12 T11 plugin install runtime`
-  (Tier 2 reframe; multi-suffix `waitForEipcChannels` over the
-  install-flow suffixes — `CustomPlugins/installPlugin` (case-doc
-  :507181) / `uninstallPlugin` / `updatePlugin` /
-  `listInstalledPlugins` / `LocalPlugins/getPlugins` — plus dual
-  `invokeEipcChannel` across TWO impl objects:
-  `CustomPlugins_$_listInstalledPlugins` with `args = [[]]` (empty
-  `egressAllowedDomains`, T33c pattern) and `LocalPlugins_$_getPlugins`
-  with `args = []`; passes on KDE-W in 28.8s cold).
+- TBD — `test(harness): session 13 lib/ax.ts AX substrate primitive`
+  (extracts `snapshotAx` from `claudeai.ts` private + T26 inlined
+  duplicate; adds `waitForAxNode` / `waitForAxNodes` predicate-based
+  polling helpers; re-exports `RawElement` / `AxNode` /
+  `axTreeToSnapshot` / `waitForAxTreeStable` from `explore/walker.ts`
+  so consumers stay inside `lib/`; refactors `claudeai.ts` and T26
+  to consume the shared substrate).

 The plan doc at
 [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
 captures the tier classification and execution-time reclassifications.
 Its "Status (post-execution)" section is the source of truth for
-what's done and what's deferred — read **session 12** first, then
-**session 11**, then **session 10**, then **session 9**, then **session
-8**, then **session 7**, then **session 6**, then **session 5**, then
-**session 4**, then **session 3**, then **session 2**, then **session
-1** sub-sections.
+what's done and what's deferred — read **session 13** first, then
+**session 12**, then **session 11**, then **session 10**, then
+**session 9**, then **session 8**, then **session 7**, then **session
+6**, then **session 5**, then **session 4**, then **session 3**, then
+**session 2**, then **session 1** sub-sections.

 This session is a continuation, not a restart. Start by reading the
 plan doc's status sections.

-### Big new findings from session 12
+### Big new findings from session 13

-1. **`LocalPlugins` registers 15 methods, `CustomPlugins` 16.**
-   Smoke-test against the user's debugger-attached running Claude
-   surfaced the full method list. Cleanly invocable read-sides:
-   `LocalPlugins.getPlugins()` → array (length 0 on dev box),
-   `LocalPlugins.getDownloadedRemotePlugins()` → array,
-   `CustomPlugins.listInstalledPlugins([[]])` → array,
-   `CustomPlugins.listMarketplaces([[]])` → array (also T33c),
-   `CustomPlugins.listAvailablePlugins([[]])` → array (also T33c),
-   `CustomPlugins.getCachedCommands()` → array,
-   `CustomPlugins.getInstallCounts()` → null,
-   `CustomPlugins.getAndClearMigrationIssues()` → null,
-   `CustomPlugins.listLocalOrgPlugins()` → array. Three methods need
-   pluginId at position 0 but accept any string (not just real plugin
-   IDs): `getPluginOAuthStatus`, `getPluginCliStatus`,
-   `getPluginShimOps`. **Two methods need extra args not derivable
-   from a fresh isolation:** `LocalPlugins.listSkillFiles` (positional
-   `pluginId` + `skillName` — `[]` rejects, `[cwd]` rejects too,
-   needs both); `CustomPlugins.listRemotePluginsPage` (positional
-   `limit: number` at 0 — every smoke-tested arg shape rejected;
-   schema-rev would resolve this via grep on the `Argument "limit" at
-   position 0` literal).
-2. **Cross-impl-object dual invocation is the strongest Tier 2
-   pattern** when the case-doc surface spans two interfaces. T11's
-   install flow involves both `CustomPlugins.*` (the API/marketplace
-   side that drives install) and `LocalPlugins.*` (the local-fs side
-   where plugins land). T11_runtime invokes one read-side from each
-   rather than picking one. Strictly stronger than single-interface
-   coverage — proves the install plumbing crosses both impls intact.
-   Mixed-arg-shape fine (one needs `[[]]`, another `[]`); same as
-   T21's mixed-shape (one returns array, another returns boolean).
-3. **The Tier 2 reframe pool is essentially exhausted.** Every Tier 1
-   fingerprint with a tractable runtime sibling has been promoted.
-   The remaining deferred items are Tier 3 (login-required write-side
-   flows), Tier 4 (out of scope), or schema-rev work to unblock the
-   still-rejecting read-sides surfaced this session
-   (`listRemotePluginsPage`, `listSkillFiles`).
+1. **Pre-existing T16 / T17 / T07 / S25 / S29-S31 flake confirmed
+   on KDE-W against the unchanged baseline.** Running the full suite
+   surfaced 12 failures, including T16 (CodeTab.activate: no AX-tree
+   button with accessibleName="Code" found) and T17. Verified
+   pre-existing by stashing the session-13 changes and re-running
+   T16 — same failure. Session 13's primitive doesn't fix the existing
+   flake; it lays groundwork. Future sessions can build flake-
+   reduction patches against `lib/ax.ts`'s `waitForAxNode` (e.g.
+   promote `activateTab`'s one-shot snapshot to a proper retry, or
+   give T07's CSS-querySelector poll a more durable wait shape if
+   that abstraction emerges).
+2. **`lib/ax.ts` is the new shared AX-tree substrate.** Surface:
+   - `snapshotAx(inspector, opts)` — single AX read with the
+     stability gate. `opts.fast` skips the gate for inside-poll
+     callers (matches the existing `claudeai.ts`/T26 contract).
+   - `waitForAxNode(inspector, predicate, opts)` — repeatedly
+     snapshot the tree and return the first matching `RawElement`,
+     null on timeout. Gates on stability once at the start
+     (configurable), then iterates with `fast: true`. Built against
+     the inline polling loops in `CodeTab.activate`, `openPill`,
+     `clickMenuItem`, T26 pre/post-click anchor scans — but the
+     existing call-sites are NOT migrated this session (their per-
+     spec retry budgets are tuned and changing them speculatively
+     risks flake). Future call-site migrations are tractable.
+   - `waitForAxNodes(inspector, predicate, opts)` — same shape,
+     returns every match. For consumers that want to enumerate.
+   - Re-exports: `RawElement`, `AxNode`, `axTreeToSnapshot`,
+     `waitForAxTreeStable` — so consumers stay inside `lib/`
+     instead of reaching into `explore/walker.ts` directly.
+3. **The debugger-attachment precondition is binding.** Sessions 9
+   through 12 did extensive runtime probing of the per-wc IPC
+   registry against the user's debugger-attached Claude. Without
+   that probing, Categories A / B / C in this prompt are blocked at
+   the smoke-test phase. If the user hasn't clicked Developer →
+   Enable Main Process Debugger before the session starts, port 9229
+   is closed and the categories pivot to either documentation work
+   or the call-site-migration shape that doesn't need runtime
+   probing. Phase 0 must check `ss -tln | grep ':9229'` (or `curl
+   --max-time 2 http://127.0.0.1:9229/json`) before fanning out.
+4. **The reframe pool remains essentially exhausted.** Same status
+   as session 12 — every Tier 1 fingerprint with a tractable runtime
+   sibling has been promoted. The remaining options are now: (a)
+   call-site migration to `waitForAxNode` for flake reduction, (b)
+   operon-mode navigation probe (still needs debugger), (c) schema-
+   rev for `listRemotePluginsPage` / `listSkillFiles` (still needs
+   debugger), (d) Tier 3 read-only reframes (most need user-account
+   state). The natural next-session shape is (a) — flake reduction
+   builds on session 13's primitive and doesn't need the debugger.

 ### Authoritative reference

 Read these in order before fanning out:

 - [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
-  — tier classification + status section. Read **session 12**, then
-  **session 11**, **session 10**, **session 9**, **session 8**,
-  **session 7**, **session 6**, **session 5**, **session 4**, **session
-  3**, **session 2**, then **session 1** "Status (post-execution)"
-  sub-sections. The Tier-3 list (search for "## Tier 3") is the
-  candidate pool for any further reframes.
+  — tier classification + status section. Read **session 13**, then
+  **session 12**, then **session 11**, **session 10**, **session 9**,
+  **session 8**, **session 7**, **session 6**, **session 5**, **session
+  4**, **session 3**, **session 2**, then **session 1** "Status (post-
+  execution)" sub-sections. The Tier-3 list (search for "## Tier 3")
+  is the candidate pool for any further reframes.
 - [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
  — runner conventions, the now-74-spec inventory, primitives in
  `lib/`, isolation defaults, the CDP-gate workaround, the eipc
-  note (covers registry walk, renderer-wrapper invocation, the
-  schema-rev pattern from session 9, the foundational-getAll
-  pattern from session 10, the dual-case-doc-anchored-read-side
-  pattern from session 11, and the cross-impl-object dual
-  invocation pattern from session 12).
+  note, and the new `lib/ax.ts` substrate (session 13 addition;
+  consumer list is `claudeai.ts` page-objects + T26).
 - [`docs/testing/cases/README.md`](cases/README.md) — case-doc
  structure and the four anchor scopes.
 - [`tools/test-harness/src/lib/`](../../tools/test-harness/src/lib/)
-  — the existing primitives. No session 12 additions; surface remains
-  the session 8 shape (`getEipcChannels` / `findEipcChannel` /
-  `findEipcChannels` / `waitForEipcChannel` / `waitForEipcChannels` /
-  `invokeEipcChannel` on `lib/eipc.ts`).
+  — the existing primitives. Session 13 added `lib/ax.ts`; surface
+  is `snapshotAx` / `waitForAxNode` / `waitForAxNodes` plus re-
+  exports of `RawElement` / `AxNode` / `axTreeToSnapshot` /
+  `waitForAxTreeStable`. The session 8 eipc surface
+  (`getEipcChannels` / `findEipcChannel` / `findEipcChannels` /
+  `waitForEipcChannel` / `waitForEipcChannels` / `invokeEipcChannel`
+  on `lib/eipc.ts`) is unchanged.
 - [`tools/test-harness/eipc-registry-probe.ts`](../../tools/test-harness/eipc-registry-probe.ts)
  — the session 7 read-only registry probe. Re-run against a
  debugger-attached Claude (`Developer → Enable Main Process
  Debugger` from the menu) to capture the current registry shape.
-  Session 12 used a small one-off smoke-test in the test-harness
-  dir (`localplugins-smoke.ts` — clones the InspectorClient
-  connection pattern from eipc-registry-probe.ts, dumps full
-  method lists for plugin-related interfaces, runs N candidate
-  read-sides through M arg shapes, reports `[OK]` / `[REJ]` per
-  probe; deleted after).
+  Sessions 11 / 12 used small one-off smoke-tests in the test-
+  harness dir that clone the InspectorClient connection pattern
+  and run N candidate read-sides through M arg shapes; deleted
+  after.
 - [`tools/test-harness/src/runners/`](../../tools/test-harness/src/runners/)
-  — every existing spec is a template. Notable session 12 templates:
-  - `T11_runtime.spec.ts` — multi-suffix `waitForEipcChannels` over
-    install-flow suffixes + dual `invokeEipcChannel` across TWO impl
-    objects (CustomPlugins + LocalPlugins). Pattern for any case-doc
-    test whose surface spans two interfaces — invoke a read-side from
-    each rather than picking one.
+  — every existing spec is a template. Notable session 13
+  candidates for follow-up:
+  - `T26_routines_page_renders.spec.ts` — first consumer of
+    `lib/ax.ts`'s exported `snapshotAx` (refactored from inline).
+    Other AX-using specs (T16, T17, H05) still call through
+    `claudeai.ts` page-objects which use the shared substrate
+    transparently.
 - [`docs/testing/cases/*.md`](cases/) — the spec each runner
  asserts. The **Code anchors:** field tells you exactly where
  upstream implements the feature.

 ### Tests in scope this session

-**Realistic ceiling: ~1 new spec OR one investigation + maybe a
-narrowly-scoped Tier 2 / schema-rev landing.** Sessions 9-12 each
-landed 1-2 specs. With coverage at 74/76, the test budget naturally
-shifts toward investigation, schema-rev for still-rejecting read-
-sides, or operon-mode probing. Session 13's main bet should aim for
-1 spec OR one substantive investigation deliverable.
+**Realistic ceiling: ~1 new spec OR one substantive flake-reduction
+deliverable OR one investigation.** Sessions 9-12 each landed 1-2
+specs; session 13 landed only a primitive (debugger blocked).
+Coverage at 74/76 means the test budget naturally shifts toward
+either (a) flake reduction against `lib/ax.ts`'s primitive, (b)
+investigation that requires the debugger and was deferred from
+sessions 12-13, or (c) Tier 3 read-only reframes that the harness
+can construct from existing `seedFromHost` state.

-#### **PRIORITY: Unify DOM loading + traversal primitives.** Take
-this on first if budget allows — the user is reporting a real,
-recurring flake: tests fail because they aren't waiting long enough
-for the DOM to render, AX-tree queries fire before the relevant
-subtree is mounted, and each spec picks its own `retryUntil` budget.
-Existing wait primitives are scattered: `electron.ts:waitForReady('userLoaded')`
-(post-login URL transition), `claudeai.ts` page-objects (each rolls
-its own `retryUntil` for AX lookups), `eipc.ts:waitForEipcChannel`
-(handler registration). No unified "wait for surface rendered"
-primitive exists. Proposed shape is **`lib/dom-ready.ts`** with
-`waitForAxNode` / `waitForAxTreeStable` / `waitForRenderedSurface`
-helpers — see plan-doc "Primitive gaps to flag" → "Unified DOM/AX
-loading + traversal primitive" for the full proposal. Pre-work:
-audit per-spec `retryUntil` budgets and AX-query sites in
-`claudeai.ts` + flaky test runners to identify the 3-5 most-flaky
-callsites; build the primitive against those specifically (not
-speculatively). Threshold-driven extraction, same way `eipc.ts` /
-`input.ts` / `electron-mocks.ts` came out of consumer pressure
-rather than design-up-front. **If this primitive is what session
-13 ships, that's a strictly higher-impact outcome than another
-Tier 2 / Tier 3 reframe — flake reduction touches every existing
-AX-using spec (T07, T16, T17, T26, H05) and unblocks future
-Code-tab AX work.**
+**Phase 0 MUST check the debugger BEFORE picking a category.** Run
+`ss -tln 2>/dev/null | grep ':9229'` (or
+`curl --max-time 2 http://127.0.0.1:9229/json`). If port 9229 is not
+listening, Categories A and C are hard-blocked. Pivot to D or B.

-**Category A (operon-mode navigation probe)** is the natural next
-step. The other 21 wrapper-exposed operon interfaces remain registry-
-unconfirmed; if any URL form recovered from the bundle surfaces
-additional handlers, that's a tractable Tier 2 reframe. **Category B
-(Tier 3 read-only reframes)** picks the lowest-hanging Tier 3 spec
-where a non-destructive read-side might be invocable from a fresh
-isolation. **Category C (schema-rev for the rejecting read-sides)**
-unblocks `listRemotePluginsPage` or `listSkillFiles` via grep on
-the rejection literal — small-scope, useful as a fallback.
+#### **PRIORITY: Call-site migration to `lib/ax.ts`'s
+`waitForAxNode` for flake reduction.** Session 13 landed the
+substrate; this session can promote the inline retry loops in
+`claudeai.ts` (`activateTab` is the strongest candidate — it does a
+one-shot snapshot with no retry, which is exactly the failure mode
+T16 hits). Smaller-scope candidates: `findCompactPills` (one-shot
+snapshot, no retry — same shape as `activateTab`), `openPill`'s
+post-click while-loop, `clickMenuItem`'s while-loop. Each migration
+is a localized refactor; verify by running the affected specs
+(T16/T17/T26/H05) and checking pass rate. Don't speculatively
+change the budget defaults — match the existing per-spec retry
+budgets so the migration is shape-only. **If this is what session
+14 ships, that's a strictly higher-impact outcome than another Tier
+2 / Tier 3 reframe — flake reduction touches every existing AX-
+using spec.** Doesn't need the debugger.

 Three categories — pick ONE as the main bet, treat the others as
 fallback if the main bet hits an early blocker:

 | # | Tests | Source | Notes |
 |---|---|---|---|
-| **A** operon-mode navigation probe | n/a (investigation) + maybe small Tier 2 reframe | new probe + bundle grep for operon URL routes | Session 10 confirmed `OperonBootstrap.ensure` registers eagerly but the other 21 wrapper-exposed operon interfaces remain registry-unconfirmed. Outputs: either an operon-mode URL form recovered from the bundle (search for `operon`-keyed routes in `claude.ai/...` paths) plus a registry re-probe after navigation, OR a deferral note explaining why operon scope can't be reached without an operon-mode entry. |
-| **B** Tier 3 read-only reframes | Pick from the Tier 3 list | T33c / T35b / T37b template + bundle grep | The Tier 3 list is full of login-required flows; some have read-only entry points that the harness CAN construct. Candidates: T22's `getPrChecks` read-side might accept a non-existent PR number / dry-run mode; T15's OAuth surface has read-only state queries. Most need the user-account-scoped state to fail-fast with a clean error rather than a real network roundtrip — investigate first. |
-| **C** Schema-rev for `listRemotePluginsPage` / `listSkillFiles` | Bundle grep | session 9 schema-rev pattern | Both methods rejected every smoke-tested arg shape during session 12's investigation. `listRemotePluginsPage` needs `limit: number` at position 0 (rejection: `Argument "limit" at position 0 ...`); `listSkillFiles` needs both `pluginId` and `skillName` (rejection: `Argument "skillName" at position 1 ...`). Bundle-grep on the rejection literals → resolve the schema → ship a narrowly-scoped Tier 2 invocation if it unblocks a case-doc claim. Smaller scope than A or B; useful as a fallback. |
+| **D** call-site migration to `waitForAxNode` | `claudeai.ts` page-objects + T26 + future Code-tab AX work | `lib/ax.ts` (session 13 primitive) | The PRIORITY shape this session. Promote `activateTab`'s one-shot snapshot to use `waitForAxNode`; same for `findCompactPills`. Validate by re-running T16 / T17 / T26 / H05 against the migrated form. Doesn't need the debugger. Risk: changing the retry shape can introduce new flake if the budget defaults don't match the existing per-spec tuning — keep migrations shape-only, no budget changes. |
+| **A** operon-mode navigation probe | n/a (investigation) + maybe small Tier 2 reframe | new probe + bundle grep for operon URL routes | Session 10 confirmed `OperonBootstrap.ensure` registers eagerly but the other 21 wrapper-exposed operon interfaces remain registry-unconfirmed. Outputs: either an operon-mode URL form recovered from the bundle (search for `operon`-keyed routes in `claude.ai/...` paths) plus a registry re-probe after navigation, OR a deferral note explaining why operon scope can't be reached without an operon-mode entry. **Needs debugger-attached Claude on port 9229.** |
+| **B** Tier 3 read-only reframes | Pick from the Tier 3 list | T33c / T35b / T37b template + bundle grep | The Tier 3 list is full of login-required flows; some have read-only entry points that the harness CAN construct. Candidates: T22's `getPrChecks` read-side might accept a non-existent PR number / dry-run mode; T15's OAuth surface has read-only state queries. Most need the user-account-scoped state to fail-fast with a clean error rather than a real network roundtrip — investigate first. **Needs debugger for smoke-test verification.** |
+| **C** Schema-rev for `listRemotePluginsPage` / `listSkillFiles` | Bundle grep | session 9 schema-rev pattern | Both methods rejected every smoke-tested arg shape during session 12's investigation. `listRemotePluginsPage` needs `limit: number` at position 0 (rejection: `Argument "limit" at position 0 ...`); `listSkillFiles` needs both `pluginId` and `skillName` (rejection: `Argument "skillName" at position 1 ...`). Bundle-grep on the rejection literals → resolve the schema → ship a narrowly-scoped Tier 2 invocation if it unblocks a case-doc claim. **Needs debugger to verify the recovered schema.** |
+
+If port 9229 is closed, only D is fully tractable. A documentation-
+only session that audits the existing AX call-sites and proposes a
+migration plan (without shipping) is also acceptable — pre-work for
+a future session that DOES land the migration.
+
+#### Category D — call-site migration to `waitForAxNode`
+
+The plan: promote inline AX retry loops in `claudeai.ts` to use
+`waitForAxNode` from `lib/ax.ts`.
+
+1. **Audit the call-sites.** `activateTab` does one-shot snapshot,
+   no retry — direct candidate. `findCompactPills` same. `openPill`
+   post-click while-loop and `clickMenuItem` while-loop both do
+   snapshot+filter+sleep — convert to `waitForAxNode` /
+   `waitForAxNodes` with the existing budget. T26's pre/post-click
+   `retryUntil` blocks are also direct candidates.
+2. **Migrate one call-site at a time.** Run the affected specs after
+   each migration (T16 / T17 / T26 / H05). Don't migrate all at
+   once — one bad budget change can cascade across multiple specs.
+3. **Don't change the retry budgets.** The existing per-spec timeouts
+   are tuned (CodeTab.activate uses 5s default but T16 passes 15s);
+   match them when migrating.
+4. **Don't add new functionality.** This is a shape-only refactor.
+   If a migration reveals a budget that's clearly wrong (e.g.
+   `activateTab` has NO retry today, which is the T16 failure mode),
+   that's a small bug-fix the migration corrects — but document it.

 #### Category A — operon-mode navigation probe

@@ -188,14 +225,12 @@ The plan: find an operon-mode URL form and verify whether the other
   "window.location.href = '<URL>'")`. After each navigation, re-run
   the registry probe and check the operon scope's interface count.
 3. **If any URL surfaces additional operon handlers**, ship a small
-   Tier 2 reframe spec (e.g. probe `OperonBootstrap.ensure` invocation
-   shape, or assert the lazy-registration count).
+   Tier 2 reframe spec.
 4. **If none of the candidate URLs surface additional handlers**,
   document as "operon scope handlers register lazily on a navigation
   we can't easily construct from the harness" and defer.

-This is the smaller-scope category — investigation + maybe one
-spec landing.
+**Needs debugger-attached Claude on port 9229.**

 #### Category B — Tier 3 read-only reframes

@@ -209,14 +244,12 @@ is invocable from a fresh `seedFromHost` isolation.
   scope. The exceptions are read-side anchors that just need
   user-account-scoped data to assert against.
 2. **Smoke-test the candidate read-side** with various arg shapes.
-   For example, T22's `LocalSessions.getPrChecks(prUrl)` might accept
-   a fake URL string and return an empty/error array shape that
-   asserts the impl is wired without making a real GitHub call —
-   investigate.
 3. **Ship a Tier 2 reframe** if the read-side resolves cleanly.
 4. **Defer** if every candidate requires real account state to assert
   meaningfully.

+**Needs debugger for smoke-test verification.**
+
 #### Category C — Schema-rev for rejecting read-sides

 The plan: resolve the validator schema for `listRemotePluginsPage` /
@@ -228,15 +261,10 @@ a case-doc claim.
   9 finding). Read ~2KB around the hit to surface the full schema.
 2. **Smoke-test the recovered schema** against the user's debugger-
   attached running Claude.
-3. **Connect the resolved invocation to a case-doc claim.** If
-   neither method connects to an existing case-doc test, the schema
-   knowledge is a finding for the plan-doc but not a spec to ship.
+3. **Connect the resolved invocation to a case-doc claim.**
 4. **Ship a Tier 2 invocation** if a case-doc claim is unblocked.
-   `listRemotePluginsPage` could potentially extend T33's plugin
-   browser coverage with a paginated listing assertion.

-This is the smallest-scope category — best fallback if A and B are
-blocked.
+**Needs debugger to verify the recovered schema.**

 #### Cross-compositor focus-shifter expansion (NOT recommended this session)

@@ -247,27 +275,32 @@ consumer.

 #### Main-side `invokeEipcChannel` fallback (NOT recommended this session)

-If a future spec needs to invoke a `claude.settings/*` handler that
-only registers on the find_in_page or main_window webContents (where
-the renderer is at `file://` and the wrapper isn't exposed), the
-main-side direct-call path is documented in session 8's Status
-section. Don't add it speculatively — wait for a real consumer.
+Same status as sessions 8-13 — wait for a real consumer.

 #### Launch event-subscription primitive (NOT recommended this session)

-Session 11 noted that `window['claude.web'].Launch` exposes 5 `on*`
-event subscribers + `activeServersStore` not visible in
-`_invokeHandlers`. No consumer asks for an event-probe primitive
-yet — wait for one.
+Same status as sessions 11-13 — wait for a real consumer.
+
+#### `waitForRenderedSurface` registry (NOT recommended this session)
+
+Session 13's `lib/ax.ts` deliberately did NOT ship a named-surface
+registry; promote when a third consumer crystallizes with a specific
+surface name in mind.
+
+#### CSS-querySelector primitive (NOT recommended this session)
+
+Session 13's `lib/ax.ts` covers AX-tree consumers only. T07's CSS-
+querySelector poll for the topbar is a different abstraction (DOM,
+not AX). Wait for a second consumer before extracting.

 ### Constraints to respect (don't violate)

-These are unchanged from sessions 1-12 and still load-bearing:
+These are unchanged from sessions 1-13 and still load-bearing:

 - **Default isolation** unless the spec needs otherwise. Use
  `seedFromHost: true` for any test that depends on authenticated
  renderer state — never assume default isolation gets past
-  `/login`. T11_runtime/T16/T19/T20/T21/T26/T22b/T27/T31b/T33b/T33c/T35b/T37b/T38b
+  `/login`. T07/T11_runtime/T16/T17/T19/T20/T21/T26/T22b/T27/T31b/T33b/T33c/T35b/T37b/T38b
  are the templates.
 - **eipc handlers register on `webContents.ipc._invokeHandlers`,
  NOT global `ipcMain._invokeHandlers`.** Session 7 finding. Use
@@ -278,57 +311,28 @@ These are unchanged from sessions 1-12 and still load-bearing:
 - **eipc invocation goes through the renderer-side wrapper at
  `window['claude.<scope>'].<Iface>.<method>`.** Session 8 finding.
  Use `lib/eipc.ts`'s `invokeEipcChannel` rather than rolling
-  main-side direct calls — the wrapper honors the per-handler origin
-  gate honestly. Main-side direct calls work but require spoofing
-  `senderFrame.url`; reserved as a fallback for non-claude.ai
-  webContents (no current consumer).
+  main-side direct calls.
 - **For arg validator schema-rev: try smoke-test first, then grep
-  the rejection message literal.** Session 9 finding. When
-  `invokeEipcChannel` rejects with `Argument "<name>" at position N
-  ... failed to pass validation`, that exact string lives inline in
-  the validator block. One grep on the literal resolves the
-  location; reading ~2KB around it surfaces the full schema. Cheaper
-  than runtime closure inspection in most cases. Session 11 finding:
-  for trivial `typeof === 'string'` validators, the smoke-test
-  resolves the shape in one round-trip — bundle-grep is unnecessary
-  overhead for simple validators. Session 12: most plugin-side
-  validators were resolvable by smoke-test alone (15-method
-  enumeration with 3-5 arg shapes per method costs ~5 minutes).
+  the rejection message literal.** Session 9 finding. Trivial
+  validators (`typeof === 'string'` / similar) resolve in one
+  round-trip. Elaborate validators get the bundle-grep treatment.
 - **For session-scoped Tier 2 reframes: `LocalSessions/getAll` is
-  the foundational read-side surrogate.** Session 10 finding. When
-  a case-doc test's anchors are write-side LocalSessions handlers
-  with no read-side equivalent, ship a registration probe over the
-  case-doc-anchored suffixes PLUS a single
-  `invokeEipcChannel('LocalSessions_$_getAll', [])` array-shape
-  assertion as the read-side surrogate.
+  the foundational read-side surrogate.** Session 10 finding.
 - **For Tier 2 reframes with case-doc-anchored read-side handlers:
  invoke the case-doc-anchored handlers directly.** Session 11
-  finding. When the case-doc has read-side anchors with resolvable
-  arg shapes (like T21's `getConfiguredServices(cwd)` /
-  `getAutoVerify(cwd)`), prefer invoking those over a foundational
-  surrogate. Mixed-shape dual invocation (one returns array, another
-  returns boolean) is fine — assert each shape independently.
+  finding. Mixed-shape dual invocation is fine.
 - **For Tier 2 reframes spanning two interfaces: invoke a read-side
-  from each.** Session 12 finding. When the case-doc surface spans
-  two impl objects (T11's CustomPlugins + LocalPlugins), invoke one
-  read-side from each rather than picking one. Cross-impl-object
-  dual invocation proves the plumbing crosses both impls intact —
-  strictly stronger than single-interface coverage. Mixed-arg-shape
-  fine (one needs `[[]]`, another `[]`).
- **`lib/input.ts` is X11-only.** Strict `XDG_SESSION_TYPE ===
-  'x11'` gate. Wayland consumers must skip — don't try to bolt
-  Wayland into the file.
- **`lib/input-niri.ts` is Niri-only.** Strict
-  `XDG_CURRENT_DESKTOP === 'niri'` gate. Sway / Hyprland / River
-  consumers must skip or live in their own per-compositor files.
+  from each.** Session 12 finding (T11_runtime template).
+- **For AX-tree consumers: use `lib/ax.ts`.** Session 13 finding.
+  `snapshotAx` for one-shot reads, `waitForAxNode` /
+  `waitForAxNodes` for predicate-based polling. Don't reach into
+  `explore/walker.ts` directly — re-exports go through `lib/ax.ts`.
+  Consumers in session 13: `lib/claudeai.ts` page-objects + T26.
+- **`lib/input.ts` is X11-only.** Strict gate.
+- **`lib/input-niri.ts` is Niri-only.** Strict gate.
 - **Don't speculate on `lib/input-wayland.ts` dispatcher.**
-  Per-compositor files until a second Wayland consumer (Sway /
-  Hyprland / River) lands. With only S14 on Niri, a dispatcher
-  is ceremony.
 - **Code-tab AX anchors stay in plan-doc until a consumer needs
-  them.** Don't preemptively add `CodeTab.activateTopTab()` to
-  `claudeai.ts` — session 5's anchors block out the work for
-  whenever a future consumer surfaces.
+  them.**
 - **CDP auth gate is alive** — runtime SIGUSR1 attach via
  `app.attachInspector()`, never Playwright's `_electron.launch()`
  or `chromium.connectOverCDP()`.
@@ -336,61 +340,49 @@ These are unchanged from sessions 1-12 and still load-bearing:
  `webContents.getAllWebContents()` not
  `BrowserWindow.getAllWindows()`. Constructor-level wraps don't
  work; use prototype-method hooks.
- **`skipUnlessRow()` always first.** First line of every `test()`
-  body when the test is row-gated.
+- **`skipUnlessRow()` always first.**
 - **No fixed sleeps.** `retryUntil` from `lib/retry.ts`, or
-  Playwright auto-wait. Fixed `sleep(N)` is a smell. (Exception:
-  short sleeps inside hand-rolled retry loops that catch typed
-  errors and short-circuit; see S11 / S14 for the pattern.)
+  Playwright auto-wait, or `waitForAxNode` from `lib/ax.ts`.
+  (Exception: short sleeps inside hand-rolled retry loops that
+  catch typed errors and short-circuit; see S11 / S14.)
 - **Diagnostics on every run.** `testInfo.attach()` the artefacts.
-  Single-shot JSON dumps for multi-state tests (S11, S14, S31,
-  T11_runtime, T19, T20, T21, T22b, T27, T31b, T33b, T33c, T35b,
-  T37b, T38b pattern) are cleaner than 5+ separate attachments.
 - **Tag with annotations.** `severity:` and `surface:` on every
  test so JUnit carries them through to matrix-regen.
- **Tabs in TS, ~80-char wrap as the existing files do.** Match
-  surrounding style.
+- **Tabs in TS, ~80-char wrap as the existing files do.**
 - **Don't break existing runners.** `npm run typecheck` must stay
  clean. H01-H05 are the canaries; `npm test` must still pass them
-  after every commit.
+  after every commit. Note that T16/T17/T07/S25/S29-S31/S04 etc.
+  are pre-existing-flaky on KDE-W per session 13's full-suite run
+  — they're NOT canaries; baseline failures don't block work.
 - **Always grep the installed asar** to verify a fingerprint
-  string is present (and how often) BEFORE shipping. Build-
-  reference is beautified — strings differ from the minified
-  bundle.
+  string is present.
 - **For mock-then-call: the helper goes in
-  `lib/electron-mocks.ts`,** not `lib/claudeai.ts`.
+  `lib/electron-mocks.ts`.**
 - **Marker windows / sacrificial host processes always die in
-  `finally`.** S11 / S14 are the templates — `marker.kill()` runs
-  before `app.close()` so the kill happens even if the spec throws.
- **Never log handler response BODIES into JUnit.** T37b's pattern
-  (response type + length only, never the body) is correct for any
-  invocation that returns user-account-scoped content. Memory bodies
-  may contain personal or sensitive content; MCP server tokens may
-  contain credentials; scheduled-task instructions may reference
-  internal projects; marketplace `pluginContext`-filtered listings
-  may surface internal-org marketplace pointers. T11_runtime's
-  defensive default extends the pattern: installed-plugin entries may
-  include workspace paths and plugin IDs that reveal org-internal
-  marketplace pointers when the user is in an org; configured dev
-  service entries (T21) may include workspace paths from auto-detect.
+  `finally`.**
+- **Never log handler response BODIES into JUnit.**

 ### Phases

 #### Phase 0 — calibration

 1. `cd tools/test-harness && npm run typecheck` — should pass.
-2. Read the plan doc's "Status (post-execution)" session 12 section,
-   then read `lib/eipc.ts`'s `invokeEipcChannel` API +
-   `T11_runtime.spec.ts` leading comments. Confirm you understand the
-   cross-impl-object dual invocation pattern.
-3. Pick ONE Category as the main bet. Each has a different shape:
+2. **Check debugger:** `ss -tln 2>/dev/null | grep ':9229'` (or
+   `curl --max-time 2 http://127.0.0.1:9229/json`). If port 9229 is
+   open, A / B / C are tractable; if closed, pivot to D or
+   documentation-only.
+3. Read the plan doc's "Status (post-execution)" session 13 section,
+   then read `lib/ax.ts`'s API + `T26` and `claudeai.ts`'s
+   integration. Confirm you understand the `snapshotAx` /
+   `waitForAxNode` / `waitForAxNodes` surface.
+4. Pick ONE Category as the main bet:
+   - **D** (PRIORITY when debugger is closed): pick 1-2 call-sites
+     in `claudeai.ts` to migrate, list which.
   - **A**: bundle grep + per-URL navigation + registry re-probe.
   - **B**: pick a Tier 3 candidate, smoke-test the read-side, decide
     ship or defer.
   - **C**: bundle grep on rejection literals, schema-rev, smoke-test
     the resolved shape, decide ship or defer.
-   List which approaches you'll try in what order, with the cap at
-   2-3 distinct approaches before STOP AND REPORT.

 If Phase 0 surfaces a problem (typecheck failing, primitives unclear,
 the chosen Category's prerequisites don't hold), stop and report.
@@ -398,6 +390,10 @@ Don't fan out.

 #### Phase 1 — fan-out batch

+For Category D (call-site migration):
+- Single subagent migrates 1-2 call-sites in `claudeai.ts` to use
+  `waitForAxNode`. Verify by running T16 / T17 / T26 / H05.
+
 For Category A (operon investigation):
 - Single subagent does bundle-grep for operon URL routes + per-URL
  registry re-probe. Report findings; if a Tier 2 reframe is
@@ -405,23 +401,20 @@ For Category A (operon investigation):

 For Category B (Tier 3 read-only reframes):
 - Spawn ONE subagent for the candidate read-side investigation
-  (smoke-test + bundle-grep if needed). Treat as exploratory; report
-  findings before committing to a spec shape.
- Cap re-spawns at 2-3 distinct approaches; if no read-side resolves
-  cleanly, STOP AND REPORT.
+  (smoke-test + bundle-grep if needed).

 For Category C (schema-rev):
- Single subagent does bundle-grep on the rejection literals, surfaces
-  the validator schemas, smoke-tests the recovered shapes against the
-  user's debugger-attached running Claude. If a recovered schema
-  unblocks a case-doc claim, ship; otherwise document and defer.
+- Single subagent does bundle-grep on the rejection literals,
+  surfaces the validator schemas, smoke-tests the recovered shapes
+  against the user's debugger-attached running Claude.

-Cap at ~1 spec total — same scope as session 12's T11_runtime.
+Cap at ~1 spec OR ~1 primitive migration total — same scope as
+sessions 9-13.

 #### Per-subagent prompt shape

 ```
-You're implementing ONE [test-harness runner | primitive |
+You're implementing ONE [test-harness runner | primitive migration |
 investigation] for <TARGET>.

 Read in order:
@@ -429,7 +422,8 @@ Read in order:
 - tools/test-harness/README.md (conventions; status section names
  the most-recent-template that fits)
 - tools/test-harness/src/runners/<closest-template>.spec.ts
- tools/test-harness/src/lib/ (the primitives you'll reuse)
+- tools/test-harness/src/lib/ (the primitives you'll reuse —
+  including session 13's `lib/ax.ts`)
 - CLAUDE.md (project conventions)

 Write tools/test-harness/src/runners/<TARGET>_short_name.spec.ts
@@ -437,8 +431,8 @@ Write tools/test-harness/src/runners/<TARGET>_short_name.spec.ts

 [per-task specifics: pattern (seedFromHost / mock-then-call /
 asar fingerprint / shared isolation / new-primitive-build /
-investigation), assertion shape, skip rules, key constraint
-warnings]
+investigation / call-site migration), assertion shape, skip rules,
+key constraint warnings]

 Constraints:
 - Tabs, ~80-char wrap.
@@ -446,7 +440,8 @@ Constraints:
 - testInfo.attach() the diagnostics from the spec's "Diagnostics
  on failure" block.
 - Tag with severity + surface annotations.
- No fixed sleeps. retryUntil or Playwright auto-wait.
+- No fixed sleeps. retryUntil, Playwright auto-wait, or
+  waitForAxNode.
 - npm run typecheck must stay clean after your edits.
 - Don't commit. The user reviews and commits.

@@ -454,17 +449,17 @@ If the target isn't reasonable to implement (anchors don't resolve
 to anything assertable, the test depends on state you can't
 construct, the existing primitives don't cover the surface), DO
 NOT write a stub. Report under Open questions and stop. Sessions
-1-12 had cumulative ~17 "stop and report" outcomes that were the
+1-13 had cumulative ~17 "stop and report" outcomes that were the
 right call.

 Report shape (~150 words):
-## <TARGET> [runner | primitive | investigation]
+## <TARGET> [runner | primitive | investigation | migration]

 - File written: tools/test-harness/src/runners/<filename>.spec.ts
-  [or lib/<newfile>.ts]
+  [or lib/<newfile>.ts or modified lib/<existing>.ts]
 - Layer: file probe | argv probe | L1 | L2 (xprop) | L2 (DBus) |
-  pgrep | new-primitive | investigation
- Assertion shape: <one sentence>
+  pgrep | new-primitive | investigation | migration
+- Assertion shape (or migration shape): <one sentence>
 - Skip rules: <which rows + why>
 - Verification path: <typecheck + run result>
 - Open questions: <caveats>
@@ -475,49 +470,49 @@ Report shape (~150 words):
 After fan-out returns:

 1. `cd tools/test-harness && npm run typecheck` — must stay clean.
-2. Run the new runners against KDE-W (the dev box) — but flag the
-   user first if any are destructive (seedFromHost kills running
-   Claude). Capture pass/skip/fail per spec for the matrix.
+2. Run the new / migrated runners against KDE-W (the dev box) — but
+   flag the user first if any are destructive (seedFromHost kills
+   running Claude). Capture pass/skip/fail per spec for the matrix.
 3. Update [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
   "Status (post-execution)" section to reflect newly-shipped
-   specs and any reclassifications discovered mid-flight.
+   specs / primitive migrations and any reclassifications.
 4. Update [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
   inventory table.
 5. Write a final report listing:
-   - Specs landed (pass / skip / needs-tuning per row)
+   - Specs landed / migrations completed (pass / skip / needs-tuning per row)
   - Primitives landed (with API shape)
   - Specs deferred (with the per-test rationale)
   - Specs reclassified (Tier 3 → Tier 2, Tier 2 → Tier 1, etc.)
   - Updated coverage stat (was 74/76 = 97%, now N/76 = M%)
-6. Don't commit. The user reviews and commits.
+6. Commit and push to `docs/compat-matrix` (the orchestration
+   directive at the top of the followup supersedes "don't commit").
 7. Rotate this prompt: rewrite
   `docs/testing/runner-implementation-followup-prompt.md` for
   the NEXT session's deferred items.

 ### Self-correction loop

-Same as sessions 1-12:
+Same as sessions 1-13:

 1. Subagent typecheck failure → re-spawn with explicit fix
   instruction.
-2. Subagent claims a runner exists but `git status` shows no new
-   file → re-spawn with explicit "use the Write tool" instruction.
+2. Subagent claims a runner / migration exists but `git status`
+   shows no new file → re-spawn with explicit "use the Write tool"
+   instruction.
 3. Two subagents wrote runners that share a primitive but with
-   different shapes → factor into `lib/<topic>.ts` BEFORE
-   shipping.
-4. Spec passes locally but the assertion is actually trivial (e.g.
-   an unauthenticated launch where the handler check vacuously
-   passes because no handlers are registered) → re-examine the
-   assertion shape.
-5. **Carry-over from session 5/6/7/8/9/10/11/12:** If the chosen
+   different shapes → factor into `lib/<topic>.ts` BEFORE shipping.
+4. Spec passes locally but the assertion is actually trivial → re-
+   examine the assertion shape.
+5. Migration breaks an existing spec → roll back the migration; the
+   per-spec retry budget was load-bearing and the primitive
+   defaults didn't match. Document the budget mismatch in plan-doc.
+6. **Carry-over from session 5/6/7/8/9/10/11/12/13:** If the chosen
   Category's investigation doesn't resolve / requires schema-rev
   that exceeds budget after 2-3 approaches, STOP. Don't keep
   digging — pivot to a fallback Category. Document what was tried.
-6. **Carry-over from session 10:** If a registration probe surfaces
-   "registered but uninvocable" (handler is on the registry but the
-   renderer-side wrapper isn't exposed for the relevant scope or the
-   validator rejects every smoke-test arg shape), document and
-   defer rather than building the main-side fallback speculatively.
+7. **Carry-over from session 10:** If a registration probe surfaces
+   "registered but uninvocable", document and defer rather than
+   building the main-side fallback speculatively.

 Cap re-spawns at 2 per file. Past that, mark as needing human
 review and move on.
@@ -534,76 +529,61 @@ Stop and write the final report when one of:
   tests.** Stop, propose where the new primitive should live in
   `lib/`. Future session adds the primitive first, then resumes.
 4. **Session budget hits ~1 new spec OR one new primitive
-   landing.** Stop, synthesize, leave the rest for the next
-   session.
+   landing OR one substantive call-site migration.** Stop,
+   synthesize, leave the rest for the next session.
 5. **All categories blocked after 2-3 attempts each.** Document the
   findings as plan-doc additions and stop — coverage is at 97%, a
   no-spec session that surfaces deferral notes is fine.

 ### What you should NOT do

- **Don't try to land Category A + B + C in one batch.** Pick
+- **Don't try to land Category D + A + B + C in one batch.** Pick
  ONE as the main bet.
 - **Don't ship stubs.** If a runner can't actually assert what the
  spec says, mark it as Tier 3 / blocked / primitive-gap and
-  don't write a placeholder. The cumulative seventeen "stop and
-  report" outcomes from sessions 1-12 were the right call — every
-  one revealed a real constraint.
+  don't write a placeholder.
 - **Don't break existing runners.** H01-H05 are the canaries.
+  T16 / T17 / T07 / S25 / S29-S31 are pre-existing-flaky on KDE-W
+  per session 13's full-suite run — those are NOT canaries.
 - **Don't restructure `lib/`** beyond targeted additions.
  Premature abstractions are wrong abstractions.
-  `electron-mocks.ts` (session 3), `input.ts` (session 4),
-  `input-niri.ts` (session 6), and `eipc.ts` registry walker
-  (session 7) + invocation surface (session 8) were threshold-
-  driven extractions, not speculative.
 - **Don't run destructive Tier 3 tests** that write to the user's
  real claude.ai account (T22 PR write, T27 scheduling write, T29
  worktree creation, T34 OAuth, T36 hooks-fire-on-prompt-submit).
-  Only the *read-only reframes* of those are in scope.
 - **Don't introspect `ipcMain._invokeHandlers` for `claude.web`
-  eipc channels.** Session 7 confirmed those use the per-wc IPC
-  scope. Use `lib/eipc.ts`'s primitive (which targets the per-wc
-  scope) instead.
- **Don't call `invokeEipcChannel` for write-side handlers** —
-  `start*`, `set*`, `write*`, `run*`, `openIn*`, `delete*`,
-  `cancel*`, `reset*`, `installPlugin`, `uninstallPlugin`,
-  `updatePlugin`, `enablePlugin`, `uploadPlugin`, `syncRemotePlugins`.
-  The primitive doesn't enforce a read-only allowlist; the safety
-  property is that case-doc-anchored suffixes are read-side OR
-  case-doc-anchored write-side suffixes are tested via REGISTRATION
-  ONLY (`waitForEipcChannels`), never invoked. T11_runtime / T19 /
-  T20 / T21 ship registration probes over write-side suffixes — that's
-  the safe pattern.
+  eipc channels.** Use `lib/eipc.ts`.
+- **Don't call `invokeEipcChannel` for write-side handlers.**
 - **Don't bolt other compositors into `lib/input-niri.ts`.**
-  Sway / Hyprland / River each get their own per-compositor file
-  if a consumer surfaces.
- **Don't bolt Wayland into `lib/input.ts`.** X11-strict gate is
-  load-bearing.
+- **Don't bolt Wayland into `lib/input.ts`.**
 - **Don't speculate on a `lib/input-wayland.ts` dispatcher.**
-  Per-compositor files until a second Wayland consumer lands.
 - **Don't preemptively build `CodeTab.activateTopTab()` /
-  `startNewSession()`.** Session 5 captured the AX anchors but
-  T36 Phase 2 (the only known consumer) was reclassified out.
+  `startNewSession()`.**
 - **Don't add a main-side `invokeEipcChannel` fallback
-  speculatively.** Build it only if a concrete consumer needs to
-  invoke through a non-claude.ai webContents. Premature primitives
-  leak design debt.
+  speculatively.**
 - **Don't speculate on a Launch event-subscription primitive.**
-  Session 11 noted that `window['claude.web'].Launch` exposes 5
-  `on*` event subscribers + `activeServersStore` not visible in
-  `_invokeHandlers`. No consumer asks for an event-probe primitive
-  yet. Wait for one.
+- **Don't extract T07's CSS-querySelector poll into `lib/ax.ts`.**
+  That's a different abstraction (DOM, not AX). Wait for a second
+  CSS-poll consumer before extracting.
+- **Don't add a `waitForRenderedSurface(client, surfaceKey)`
+  registry to `lib/ax.ts`.** Session 13 deliberately deferred
+  this — wait for a third consumer with a specific named surface.
+- **Don't change the existing per-spec retry budgets when migrating
+  to `waitForAxNode`.** The budgets are tuned. Migration is shape-
+  only.
+- **Don't reach into `explore/walker.ts` for AX types/helpers.**
+  `lib/ax.ts` re-exports `RawElement` / `AxNode` /
+  `axTreeToSnapshot` / `waitForAxTreeStable` — use those.
 - **Don't implement the #569 power-inhibit patch in this
  session.** That's a separate workstream.
- **Don't commit.** The user reviews and commits.

 ### Final report format

 ```markdown
-## Runner implementation summary (session 13)
+## Runner implementation summary (session 14)

- Main-bet category: A | B | C
+- Main-bet category: D | A | B | C
 - Specs landed: N
+- Migrations completed: N
 - Primitives landed: N
 - Reclassified mid-flight: N (with reasons)
 - Coverage: was 74/76 (97%), now <NEW>/76 (<PCT>%)
@@ -614,7 +594,7 @@ Stop and write the final report when one of:

 | Cat | Test ID | File | Assertion shape | Status |
 |---|---|---|---|---|
-| A | <test_id> | <file>.spec.ts | … | ✓ pass / skip / fail |
+| D | <call-site> | <file>.ts | … | ✓ pass / skip / fail |
 | ... |

 ## Notable findings
@@ -624,9 +604,7 @@ Stop and write the final report when one of:
 - ...

 ## Files touched
-git status output (tools/test-harness/src/runners/*.spec.ts +
-maybe lib/* primitives if extraction was needed; possibly plan-doc /
-README updates).
+git status output.

 ## Diff summary
 git diff --stat
@@ -639,79 +617,44 @@ git diff --stat
 - Each subagent's Write calls land directly in the working tree.
 - The grounding probe (`tools/test-harness/grounding-probe.ts`)
  can help when implementing a runner that asserts runtime API
-  state — capture once with `npm run grounding-probe -- --launch
-  --include-synthetic`, grep the output for the IPC channel /
-  accelerator / API your runner needs to assert against.
+  state.
 - The eipc-registry probe (`tools/test-harness/eipc-registry-probe.ts`)
  is the dedicated tool for inspecting per-wc IPC handler state.
-  Useful when designing new probes or auditing for upstream drift.
  Connects to a debugger-attached running Claude on port 9229.
 - For seedFromHost specs, the host MUST have a signed-in Claude
  Desktop. The primitive throws with a clear message if not.
-  Document the prerequisite in your runner's leading comment if
-  it's the first one to add seedFromHost coverage to a new
-  surface.
- For tests that touch the AX tree, `claudeai.ts` page-objects
-  are the right substrate — see `T17_folder_picker.spec.ts` for
-  the end-to-end example. Don't query DOM by CSS selector unless
-  `claudeai.ts` doesn't already cover the surface. Code-tab
-  session-opener anchors are documented in plan-doc session 5;
-  don't add them to `claudeai.ts` unless a consumer surfaces.
- For mock-then-call: helpers live in `lib/electron-mocks.ts`
-  (extracted in session 3). See T24's leading comment for the
-  `Promise<boolean>` variant + T25's for the void variant.
+- For tests that touch the AX tree, **`lib/ax.ts`** is the new
+  shared substrate. `claudeai.ts` page-objects are still the
+  right substrate for renderer-UI domain operations (CodeTab,
+  compact pills, menu items) — they consume `lib/ax.ts`
+  internally. Don't query DOM by CSS selector unless `claudeai.ts`
+  doesn't already cover the surface.
+- For mock-then-call: helpers live in `lib/electron-mocks.ts`.
 - For focus-shifting (X11 only): `lib/input.ts` exports
-  `focusOtherWindow` + `spawnMarkerWindow`. See S11 for the
-  end-to-end consumer pattern.
- For Wayland-native focus-shifting (Niri only): `lib/input-niri.ts`
-  exports the same shape with `niri msg --json` IPC + `foot`
-  marker. See S14 for the end-to-end consumer pattern.
+  `focusOtherWindow` + `spawnMarkerWindow`.
+- For Wayland-native focus-shifting (Niri only): `lib/input-niri.ts`.
 - For eipc registry walking: `lib/eipc.ts` exports
  `getEipcChannels` / `findEipcChannel` / `findEipcChannels` /
-  `waitForEipcChannel` / `waitForEipcChannels` against
-  `webContents.ipc._invokeHandlers`. See T11_runtime / T19 / T20 /
-  T21 / T22b / T31b / T33b / T38b for end-to-end consumer patterns.
- For eipc invocation: `lib/eipc.ts` exports `invokeEipcChannel`
-  (renderer-side wrapper at
-  `window['claude.<scope>'].<Iface>.<method>`). See T11_runtime / T19 /
-  T20 / T21 / T27 / T33c / T35b / T37b for end-to-end consumer patterns.
+  `waitForEipcChannel` / `waitForEipcChannels`.
+- For eipc invocation: `lib/eipc.ts` exports `invokeEipcChannel`.
  Only call read-side suffixes; the primitive doesn't enforce a
-  read-only allowlist. Cross-impl-object dual invocation pattern is
-  T11_runtime; single-interface dual is T21 / T33c.
+  read-only allowlist.
 - **For arg validator schema-rev (sessions 9 / 11 / 12 findings):**
-  when invocation rejects with `Argument "<name>" at position N ...
-  failed to pass validation`, FIRST try smoke-testing common arg
-  shapes against the user's debugger-attached Claude (session 11's
-  `launch-cwd-smoke.ts` / session 12's `localplugins-smoke.ts`
-  pattern — clone the InspectorClient connection, iterate over arg
-  shape candidates, report `[OK]` / `[REJ]` per shape). For trivial
-  validators (`typeof === 'string'` / similar), this resolves the
-  schema in one round-trip and avoids needing bundle-grep. For more
-  elaborate validators, fall back to grep on the bundled `index.js`
-  for the literal rejection string; validator block sits ~50-200
-  chars before the throw site. See plan-doc session 9 status section
-  for the byte offsets of the two CustomPlugins validators (5013601
-  / 5018821) as worked examples.
+  smoke-test first, bundle-grep on rejection literal as fallback.
 - **For session-scoped Tier 2 reframes (session 10 finding):**
-  `LocalSessions/getAll` is the foundational read-side surrogate
-  when case-doc anchors are write-side. Pattern: `args = []`,
-  returns `Array<Session>`. T19 and T20 are the templates.
+  `LocalSessions/getAll` foundational read-side surrogate.
 - **For Tier 2 reframes with case-doc-anchored read-side handlers
-  (session 11 finding):** invoke the case-doc-anchored handlers
-  directly rather than using a foundational surrogate. Mixed-shape
-  dual invocation is fine. T21 is the template (one returns array,
-  another returns boolean — assert each shape independently).
+  (session 11 finding):** invoke directly. Mixed-shape OK.
 - **For Tier 2 reframes spanning two interfaces (session 12
-  finding):** invoke a read-side from each impl object. T11_runtime
-  is the template (CustomPlugins/listInstalledPlugins array +
-  LocalPlugins/getPlugins array — proves the install plumbing
-  crosses both impls intact). Mixed-arg-shape fine.
+  finding):** invoke a read-side from each impl object.
+- **For AX-tree polling (session 13 finding):** `lib/ax.ts`'s
+  `waitForAxNode` / `waitForAxNodes` for predicate-based polling.
+  `snapshotAx` for one-shot reads. Re-exports keep
+  `explore/walker.ts` types accessible without crossing the
+  lib/explore boundary.
 - **For asar fingerprints: ALWAYS grep the installed asar
  first.** Build-reference is beautified; the bundle is
-  minified. Case-doc text may be the user-facing form, not the
-  bundle form (e.g. `~/.claude.json` vs `.claude.json`). T18
-  reads `mainView.js`, not `index.js` — `lib/asar.ts`'s
-  `readAsarFile(filename, asarPath)` already handles this.
+  minified.
  ```bash
  cd tools/test-harness && node -e "
    const {extractFile} = require('@electron/asar');
--- a/docs/testing/runner-implementation-plan.md
+++ b/docs/testing/runner-implementation-plan.md
@@ -18,6 +18,116 @@ work begins.

 ## Status (post-execution)

+**Shipped session 13 (1 new primitive, no new spec):** `lib/ax.ts` —
+shared AX-tree loading + traversal substrate, threshold-driven
+extraction. The plan-doc had flagged "Unified DOM/AX loading +
+traversal primitive" in session 12 as the natural priority for
+session 13 if the operon / Tier 3 / schema-rev categories were
+blocked. Phase 0 of session 13 found the debugger detached on the
+dev box (port 9229 not listening), which blocked Categories A and C
+(operon-mode navigation probe + schema-rev for `listRemotePluginsPage`
+/ `listSkillFiles` — both need runtime probing against the user's
+debugger-attached running Claude). Category B (Tier 3 read-only
+reframes) ALSO effectively required the debugger for the smoke-test
+investigation phase. The PRIORITY (DOM unification) primitive
+landed as the strongly-supported alternative — two threshold-
+driven extraction signals (T26 had duplicated `snapshotAx` from
+claudeai.ts, plus user-reported flake in AX-tree queries).
+
+Coverage stays at 74/76 (97%) — primitive-only session, no spec
+landed. The matrix coverage doesn't reflect primitive landings;
+those show up in the `lib/` surface and are picked up by future
+spec consumers.
+
+Two commits on `docs/compat-matrix` expected (SHAs inserted after
+the test-harness commit lands — the user reviews and commits at the
+end of every session):
+
+- TBD — `test(harness): session 13 lib/ax.ts AX substrate primitive`
+  (extracts `snapshotAx` + adds `waitForAxNode` / `waitForAxNodes`;
+  refactors `claudeai.ts` and `T26_routines_page_renders.spec.ts` to
+  consume the shared substrate instead of carrying duplicate
+  implementations; passes typecheck + H01-H03 canaries + T26 +
+  T11_runtime spot-checks on KDE-W).
+
+Session 13 findings + reclassifications:
+
+- **`lib/ax.ts` primitive surface.** Threshold-driven extraction
+  hitting 2 consumers (the formerly-private `snapshotAx` in
+  `claudeai.ts` + the explicit duplicate in T26 noted as
+  "premature abstraction at 1 consumer"). Surface:
+  - `snapshotAx(inspector, opts)` — single AX read with a stability
+    gate. `opts.fast` skips the gate for inside-poll callers
+    (matches the existing internal contract).
+  - `waitForAxNode(inspector, predicate, opts)` — repeatedly
+    snapshot the tree and return the first matching `RawElement`,
+    or null on timeout. Gates on stability once at the start
+    (configurable), then iterates with `fast: true`. Built against
+    the inline polling loops in `CodeTab.activate`, `openPill`,
+    `clickMenuItem`, T26 pre/post-click anchor scans.
+  - `waitForAxNodes(inspector, predicate, opts)` — same shape,
+    returns every match. For consumers that want to enumerate.
+  - Re-exports: `RawElement`, `AxNode`, `axTreeToSnapshot`,
+    `waitForAxTreeStable` — so consumers don't have to reach into
+    `explore/walker.ts` themselves. Walker stays the source of
+    truth for AX-snapshot construction; this file is the runner-
+    facing alias.
+- **Refactor scope was minimal.** `claudeai.ts` swaps its private
+  `snapshotAx` for the shared one (5-line import change). T26
+  drops its inlined helper and imports from `lib/ax.ts`. No
+  call-site rewrites — the predicate-based polling in
+  `CodeTab.activate` / `openPill` / `clickMenuItem` is unchanged
+  this session. Future sessions can opportunistically migrate
+  hand-rolled retry loops to `waitForAxNode` when re-touching
+  those code paths; not forced this session because the call-site
+  retry patterns each carry per-spec budget tuning that the
+  primitive's defaults need to validate against real flake data.
+- **Why no spec landed.** Phase 0 calibration found port 9229
+  detached (Claude was running but debugger wasn't attached via
+  Developer → Enable Main Process Debugger). Categories A and C
+  strictly need runtime probing against the debugger; Category B
+  needs the debugger for the smoke-test verification phase (per
+  session-12 pattern). The PRIORITY primitive build was the
+  highest-impact deliverable that didn't require the debugger —
+  pure static-analysis-driven extraction with two existing
+  consumers as the threshold signal. Primitive-only sessions are
+  in scope per the followup prompt's termination criteria
+  ("Session budget hits ~1 new spec OR one new primitive
+  landing").
+- **What's NOT in `lib/ax.ts`.** Did NOT add a
+  `waitForRenderedSurface(client, surfaceKey)` registry — the
+  plan-doc flag mentioned it but no consumer asks for a named
+  surface anchor today; promote when a third consumer crystallizes
+  with a specific surface name in mind. Did NOT extract T07's
+  CSS-querySelector poll loop — that's a different abstraction
+  (DOM, not AX) with no second consumer signal yet. Did NOT
+  rewrite call-site retry budgets in `claudeai.ts` — the budgets
+  are tuned per-spec and changing them speculatively risks
+  introducing flake rather than removing it.
+- **Pre-existing T16 / T17 flake confirmed unchanged.** Running
+  the full suite found T16 / T17 / T07 / S25 / S29-S31 / etc.
+  failing on KDE-W — these failures are pre-existing on the
+  baseline (verified by stashing the session-13 changes and re-
+  running T16, which still failed with the same
+  `CodeTab.activate: no AX-tree button with accessibleName="Code"
+  found` error). Session 13's primitive doesn't fix the existing
+  flake; it lays groundwork that future sessions can build
+  flake-reduction patches against (e.g. promoting `activateTab`
+  to use `waitForAxNode` with a longer budget instead of a one-
+  shot snapshot would be the next session's natural follow-up).
+
+Tier 2 → Tier 2 candidates remaining for next session: the same
+list as session 12 — operon-mode navigation probe (still needs a
+debugger-attached Claude), schema-rev for `listRemotePluginsPage`
+/ `listSkillFiles` (same), Tier 3 read-only reframes (same). The
+new option for next session is **call-site migration to
+`waitForAxNode`** — promote `activateTab`'s one-shot snapshot to a
+proper retry, give T07's CSS poll a more durable wait shape, etc.
+That's a flake-reduction session shape rather than a coverage-
+expansion shape; the session 13 primitive made it tractable.
+
+---
+
 **Shipped session 12 (1 new spec, no primitive change):** T11_runtime
 (Tier 2 reframe — `seedFromHost` + multi-suffix registration probe
 over five install-flow handlers + dual-handler invocation across two
@@ -1642,35 +1752,22 @@ a primitive that needs a small extension:
  dependent), but if it ever becomes tractable, a
  `lib/displays.ts` mocking `screen.getAllDisplays()` would be
  the entry.
- **Unified DOM/AX loading + traversal primitive (FLAGGED session
-  12).** Existing wait/traversal primitives are scattered:
-  `electron.ts:waitForReady('userLoaded')` covers the post-login
-  webContents URL transition; `claudeai.ts` page-objects roll their
-  own `retryUntil` for AX-tree node lookups; `eipc.ts:waitForEipcChannel`
-  covers handler registration. The user reports lots of failures
-  because tests aren't waiting long enough for the DOM to render —
-  AX-tree queries fire before the relevant subtree is mounted, and
-  individual specs each pick their own `retryUntil` budget. Symptoms:
-  flaky AX-anchor lookups under cold-cache or slow-machine conditions;
-  premature `waitForReady('userLoaded')` resolution before claude.ai's
-  client-side router has hydrated the surface the test wants to query.
-  Proposed shape: **`lib/dom-ready.ts`** exporting one or more
-  composable wait helpers — e.g. `waitForAxNode(client, selector,
-  opts)` (retryUntil over the AX walker with a sensible default
-  budget, ~15-30s, plus a per-call override), `waitForAxTreeStable(client,
-  opts)` (no node count change for N consecutive ticks — proxy for
-  "render finished"), and `waitForRenderedSurface(client, surfaceKey)`
-  (case-doc-anchored surface markers — a small registry of known
-  anchors per surface so consumers don't roll their own AX selectors).
-  Should also unify the existing `claudeai.ts` activation methods
-  around the new helpers rather than each rolling its own retryUntil.
-  Touches enough specs that a session 13 primitive build would
-  reduce flake across T16/T17/T26/T07/H05 plus any future Code-tab
-  AX work — flag as the main bet for session 13 if the operon /
-  Tier 3 / schema-rev categories are blocked. Pre-work: audit
-  per-spec `retryUntil` budgets and AX-query sites to identify the
-  3-5 most-flaky callsites; build the primitive against those
-  specifically rather than speculatively.
+- **Unified DOM/AX loading + traversal primitive (LANDED session
+  13 as `lib/ax.ts`).** Threshold-driven extraction once T26 had to
+  redefine `snapshotAx` inline (after `claudeai.ts`'s private copy
+  was the only consumer for sessions 1-12). The primitive surface
+  exports `snapshotAx`, `waitForAxNode`, `waitForAxNodes`, plus
+  re-exports of `RawElement` / `AxNode` / `axTreeToSnapshot` /
+  `waitForAxTreeStable` so consumers don't reach into
+  `explore/walker.ts` directly. `claudeai.ts` and T26 both consume
+  the shared substrate; future call-site migrations (e.g.
+  `activateTab` → `waitForAxNode`) are tractable now. The
+  speculative `waitForRenderedSurface(client, surfaceKey)` shape
+  was deliberately NOT shipped — no consumer asks for a named-
+  surface registry today; promote when a third consumer
+  crystallizes with a specific surface name. The CSS-querySelector
+  poll in T07 was deliberately NOT extracted — different
+  abstraction (DOM, not AX), no second consumer signal yet.

 ## Open questions for the parent agent

--- a/tools/test-harness/README.md
+++ b/tools/test-harness/README.md
@@ -120,11 +120,18 @@ against case-doc anchors; consumed by T19 / T20 / T22b / T31b / T33b /
 T38b) plus its session 8 invoke surface (`invokeEipcChannel` — calls
 a registered handler through the renderer-side wrapper at
 `window['claude.<scope>'].<Iface>.<method>`; consumed by T19 / T20 /
-T27 / T33c / T35b / T37b) — and the `createIsolation({ seedFromHost:
-true })` primitive that lets login-required tests run hermetically
-against a copy of the host's signed-in auth state (T07, T11_runtime,
-T16, T19, T20, T21, T22b, T26, T27, T31b, T33b, T33c, T35b, T37b,
-T38b).
+T27 / T33c / T35b / T37b), the `lib/ax.ts` AX-tree substrate
+(`snapshotAx` for one-shot reads + `waitForAxNode` / `waitForAxNodes`
+for predicate-based polling, plus re-exports of `RawElement` /
+`AxNode` / `axTreeToSnapshot` / `waitForAxTreeStable` from
+`explore/walker.ts` so consumers stay inside `lib/`; threshold-
+driven extraction in session 13 once T26 had to duplicate the
+formerly-private `snapshotAx` from `claudeai.ts`; consumed by
+`claudeai.ts` page-objects + T26) — and the
+`createIsolation({ seedFromHost: true })` primitive that lets login-
+required tests run hermetically against a copy of the host's signed-
+in auth state (T07, T11_runtime, T16, T19, T20, T21, T22b, T26, T27,
+T31b, T33b, T33c, T35b, T37b, T38b).

 Note on eipc channels: the `LocalSessions_$_*` and `CustomPlugins_$_*`
 channel names referenced in the case-doc Code anchors don't register