test(harness): session 13 lib/ax.ts AX substrate primitive (no spec, coverage unchanged at 97%)

Threshold-driven extraction of the AX-tree loading + traversal
substrate. `claudeai.ts` page-objects and `T26_routines_page_renders`
both carried inline copies of the same `snapshotAx` helper (T26's
even noted "premature abstraction at 1 consumer" — with two consumers
the threshold is met). Plus the user reports recurring AX-query flake.

Surface (`tools/test-harness/src/lib/ax.ts`):

- snapshotAx(inspector, opts) — single AX read with the stability
  gate. opts.fast skips the gate for inside-poll callers (matches
  the existing private-helper contract in claudeai.ts).
- waitForAxNode(inspector, predicate, opts) — repeatedly snapshot
  the tree and return the first matching RawElement, or null on
  timeout. Gates on stability once at the start (configurable),
  then iterates with fast: true. Built against the inline polling
  loops in CodeTab.activate, openPill, clickMenuItem, and T26's
  pre/post-click anchor scans — the existing call-sites are NOT
  migrated this session (per-spec retry budgets are tuned, changing
  them speculatively risks introducing flake).
- waitForAxNodes(inspector, predicate, opts) — same shape, returns
  every match. For consumers that want to enumerate.
- Re-exports: RawElement, AxNode, axTreeToSnapshot,
  waitForAxTreeStable from explore/walker.ts so consumers stay
  inside lib/ instead of reaching into explore/. Walker remains
  the source of truth for AX-snapshot construction; lib/ax.ts is
  the runner-facing alias.

Refactors:

- claudeai.ts swaps its private snapshotAx for the shared one
  (5-line import change; call-sites unchanged).
- T26_routines_page_renders.spec.ts drops its inlined helper and
  imports from lib/ax.ts.

Phase 0 of session 13 found port 9229 detached (Claude was running
but Developer → Enable Main Process Debugger had not been clicked),
which blocked Categories A (operon-mode navigation probe) and C
(schema-rev for listRemotePluginsPage / listSkillFiles) — both need
runtime probing. Category B (Tier 3 read-only reframes) effectively
needed the debugger too. The PRIORITY-flagged DOM unification
primitive was tractable without it (pure static-analysis-driven
extraction), so session 13 pivoted there. Coverage stays at 74/76
(97%) since primitive-only sessions don't move the spec count.

What's NOT in lib/ax.ts:

- waitForRenderedSurface(client, surfaceKey) — the plan-doc proposal
  mentioned a named-surface registry but no consumer asks for it
  today; promote when a third consumer crystallizes with a specific
  surface in mind.
- CSS-querySelector primitive — T07's topbar poll is a different
  abstraction (DOM, not AX). No second consumer signal yet.
- Call-site retry budget changes — the per-spec budgets are tuned;
  speculative changes risk introducing flake. Migration to
  waitForAxNode is a future session's work.

Verification: typecheck clean; H01-H03 canaries pass; T26 passes
(21.1s on KDE-W); T11_runtime spot-check passes. Pre-existing T16 /
T17 / T07 / S25 / S29-S31 flake is unchanged on the baseline (verified
by stashing the session-13 changes and re-running T16).

Co-Authored-By: Claude <claude@anthropic.com>
This commit is contained in:
aaddrick
2026-05-03 23:56:47 -04:00
parent a8093a8e11
commit 3d47f33ccb
3 changed files with 270 additions and 54 deletions

View File

@@ -0,0 +1,255 @@
// AX-tree loading + traversal primitives — shared substrate for any
// test that reads from Chromium's accessibility tree.
//
// Why this exists
// ---------------
// Sessions 1-12 grew two parallel AX consumers without consolidating
// the loading shape:
//
// 1. `lib/claudeai.ts` page-objects (CodeTab.activate, openPill,
// clickMenuItem, findCompactPills) carry a private `snapshotAx`
// that gates on `waitForAxTreeStable` then calls
// `inspector.getAccessibleTree('claude.ai')` and converts via
// `axTreeToSnapshot`. Every page-object that polls for a node
// rolls its own retryUntil/while loop around that helper.
//
// 2. `src/runners/T26_routines_page_renders.spec.ts` re-implemented
// the same `snapshotAx` shape inline because the claudeai.ts
// version isn't exported. Its leading comment explicitly noted
// this was "premature abstraction" at 1 consumer; with 2 it is
// threshold-driven extraction.
//
// Plus the user reports recurring flake in tests that use the AX tree:
// queries fire before the relevant subtree is mounted, and individual
// specs each pick their own retryUntil budget. The proposed
// `waitForAxNode` primitive collapses the snapshot+find+retry shape
// into one helper with a single tunable budget per consumer, reducing
// both the surface area for budget drift and the duplication.
//
// What this primitive does
// ------------------------
// - `snapshotAx(inspector, opts)` — single AX tree read with the
// stability gate. Replaces the duplicated implementations in
// `claudeai.ts` (private) and `T26_routines_page_renders.spec.ts`
// (inlined). `opts.fast` skips the stability gate for inside-poll
// callers (matches the existing claudeai.ts contract).
// - `waitForAxNode(inspector, predicate, opts)` — repeatedly snapshot
// the AX tree and return the first element matching `predicate`,
// subject to a timeout. Built against the loops in `CodeTab.activate`
// (poll for compact pills), `openPill` (poll for menu items),
// `clickMenuItem` (poll for matching menuitem), and T26's pre/post-
// click anchor scans. The predicate carries the discrimination
// logic the caller already had inline; the primitive owns the
// stability-gate + retry loop.
// - Re-exports `RawElement`, `axTreeToSnapshot`, `waitForAxTreeStable`
// from `explore/walker.ts` so consumers don't need to reach across
// the lib/explore boundary themselves. The walker stays the source
// of truth for the AX-snapshot shape; this file is the runner-
// facing surface.
//
// Scope boundaries
// ----------------
// This is NOT a "wait for surface rendered" registry. The plan-doc
// proposal mentioned `waitForRenderedSurface(client, surfaceKey)`
// with a registry of named surface anchors — that's still
// speculative (no consumer asks for it). When a third consumer
// emerges that already knows it wants a named surface anchor (e.g.
// "the Code tab body has mounted"), promote the relevant claudeai.ts
// page-object into a registry entry. Today, `waitForAxNode` with a
// predicate covers every observed callsite.
//
// This is also NOT a CSS-querySelector primitive. T07 polls the DOM
// via `document.querySelector('[data-testid=...]')` for the topbar;
// that's a different abstraction (DOM, not AX) with no extraction
// signal yet — leave it inline in T07 until a second consumer
// surfaces.
import type { AxNode, InspectorClient } from './inspector.js';
import {
type RawElement,
axTreeToSnapshot,
waitForAxTreeStable,
} from '../../explore/walker.js';
import { retryUntil } from './retry.js';
// Re-exports for consumer convenience. Anything that today imports
// `RawElement` / `axTreeToSnapshot` / `waitForAxTreeStable` from
// `../../explore/walker.js` can switch to this file as the import
// path. Keeping the walker as the source of truth — these are the
// runner-facing aliases.
export type { AxNode } from './inspector.js';
export {
type RawElement,
axTreeToSnapshot,
waitForAxTreeStable,
} from '../../explore/walker.js';
// Re-export the AxNode -> RawElement[] conversion as a single import
// point. (Kept distinct from `axTreeToSnapshot`'s walker-side export
// so future renames in `explore/walker.ts` don't churn the runner-
// facing API.)
export interface SnapshotAxOptions {
// Skip the upfront `waitForAxTreeStable` gate. Default false —
// i.e. callers gate by default. Pass true inside polling loops
// where the gate fights the loop: each iteration would block
// waiting for "no node-count change" even when the change we're
// polling for is exactly the AX tree updating.
//
// `waitForAxNode` itself uses fast=true on every iteration after
// gating once at the start; consumers calling `snapshotAx` from
// inside a hand-rolled loop should do the same.
fast?: boolean;
// AX-stability gate budget when `fast` is false. Default 10000ms
// — matches the existing claudeai.ts/T26 inline implementations.
// Increase for cold-cache cases on slow machines.
stabilityTimeoutMs?: number;
// Renderer URL filter for `inspector.getAccessibleTree`. Default
// 'claude.ai'. Tests against a different webContents (find_in_page,
// main_window) can override but the AX tree on those is much
// simpler — `claude.ai` is the only one current consumers care
// about.
urlFilter?: string;
}
// Single AX-tree read, returning the walker's flat RawElement[]
// snapshot. Identical contract to the private `snapshotAx` formerly in
// `claudeai.ts` and the inlined one formerly in T26 — extracted here
// so both consumers share an implementation.
//
// Cost: ~800ms when the stability gate hits "stable" on the first
// pair of reads (interior-loop fast=true callers skip this); a few
// seconds on cold-cache. The AX tree itself is comparatively cheap
// to fetch and convert (~50-100ms).
export async function snapshotAx(
inspector: InspectorClient,
opts: SnapshotAxOptions = {},
): Promise<RawElement[]> {
if (!opts.fast) {
await waitForAxTreeStable(inspector, {
minNodes: 1,
timeoutMs: opts.stabilityTimeoutMs ?? 10_000,
});
}
const url = opts.urlFilter ?? 'claude.ai';
const nodes: AxNode[] = await inspector.getAccessibleTree(url);
return axTreeToSnapshot(nodes);
}
export interface WaitForAxNodeOptions {
// Total budget for the polling loop. Default 5000ms — matches the
// claudeai.ts / T26 callsites that the primitive replaces. Override
// upward for cold-cache or post-click cases (T26 uses 10s post-
// click; CodeTab.activate uses 5s default but T16 passes 15s).
timeoutMs?: number;
// Per-iteration interval. Default 200ms — matches the existing
// inline retryUntil({ interval: 200 }) calls. The AX tree fetch
// itself dominates the loop cost; a shorter interval gives no
// throughput benefit and a longer one delays the resolution.
intervalMs?: number;
// Renderer URL filter passed through to `snapshotAx`. Default
// 'claude.ai'.
urlFilter?: string;
// Whether to gate on `waitForAxTreeStable` once before entering
// the poll loop. Default true. When the caller has just mutated
// the page (e.g. clicked a button and is waiting for the
// resulting menu to render) the upfront stability gate is what
// keeps the first iteration from racing the in-flight render.
// After the upfront gate, every iteration uses fast=true so the
// loop iterates without re-blocking on stability.
stabilityGate?: boolean;
// AX-stability gate budget for the upfront `waitForAxTreeStable`
// when `stabilityGate` is true. Default 5000ms. Independent from
// the outer poll budget — the gate is a hard precondition, not
// part of the find loop.
stabilityTimeoutMs?: number;
}
// Poll the AX tree until the predicate matches a node, or the budget
// runs out. Returns the matched RawElement on success, null on
// timeout.
//
// The predicate runs over RawElement (the walker-snapshot shape) so
// callers can use the same `el.computedRole === 'button' &&
// el.accessibleName === 'Code'` form they already have inline. The
// helper does NOT click the matched node — callers receive the
// RawElement and can pass `el.backendDOMNodeId` to
// `inspector.clickByBackendNodeId` if a click follows. Keeping click
// out of the find primitive lets composite consumers (e.g. "find then
// click then poll for the menu") chain cleanly.
//
// On timeout, returns null. Callers that want a hard fail with a
// diagnostic should pattern-match `if (!found) throw new Error(...)`
// — the primitive doesn't throw because some specs surface
// missing-node as a clean fail with a JSON snapshot attachment
// rather than an uncaught timeout.
//
// The `name` param is purely for diagnostic message hygiene if a
// consumer wraps a throw around the null return — it's appended to
// the implicit "looking for a node matching <predicate>" so failure
// logs read meaningfully. Optional; pass an empty string to suppress.
export async function waitForAxNode(
inspector: InspectorClient,
predicate: (el: RawElement) => boolean,
opts: WaitForAxNodeOptions = {},
): Promise<RawElement | null> {
const stabilityGate = opts.stabilityGate ?? true;
if (stabilityGate) {
await waitForAxTreeStable(inspector, {
minNodes: 1,
timeoutMs: opts.stabilityTimeoutMs ?? 5_000,
});
}
return retryUntil(
async () => {
const elements = await snapshotAx(inspector, {
fast: true,
urlFilter: opts.urlFilter,
});
return elements.find(predicate) ?? null;
},
{
timeout: opts.timeoutMs ?? 5_000,
interval: opts.intervalMs ?? 200,
},
);
}
// Same shape as `waitForAxNode` but returns every match rather than
// the first. Useful for consumers that want to enumerate all menu
// items or all compact pills after a stability point — the
// findCompactPills caller in claudeai.ts is a one-shot snapshot
// today, but if a consumer needs to wait for "at least one compact
// pill" plus enumerate the resulting set, this avoids a second
// round-trip.
//
// Returns the (possibly empty) array on success, null on timeout
// when no element ever matched. A successful call with zero matches
// is impossible by construction — the loop only resolves once the
// post-filter array is non-empty.
export async function waitForAxNodes(
inspector: InspectorClient,
predicate: (el: RawElement) => boolean,
opts: WaitForAxNodeOptions = {},
): Promise<RawElement[] | null> {
const stabilityGate = opts.stabilityGate ?? true;
if (stabilityGate) {
await waitForAxTreeStable(inspector, {
minNodes: 1,
timeoutMs: opts.stabilityTimeoutMs ?? 5_000,
});
}
return retryUntil(
async () => {
const elements = await snapshotAx(inspector, {
fast: true,
urlFilter: opts.urlFilter,
});
const matches = elements.filter(predicate);
return matches.length > 0 ? matches : null;
},
{
timeout: opts.timeoutMs ?? 5_000,
interval: opts.intervalMs ?? 200,
},
);
}

View File

@@ -29,12 +29,12 @@
// - Menu items: any of `menuitem` / `menuitemradio` /
// `menuitemcheckbox` (collected as MENU_ITEM_ROLES below).
import type { AxNode, InspectorClient } from './inspector.js';
import type { InspectorClient } from './inspector.js';
import {
type RawElement,
axTreeToSnapshot,
snapshotAx,
waitForAxTreeStable,
} from '../../explore/walker.js';
} from './ax.js';
import { retryUntil, sleep } from './retry.js';
// All three CDP-exposed menu-item variants. Caller code wants to treat
@@ -52,36 +52,12 @@ const MENU_ITEM_ROLES = new Set<string>([
// want, so excluding them by name is the load-bearing discriminator.
const ROW_MORE_OPTIONS_RE = /^More options for /;
interface SnapshotOpts {
// Skip the AX-tree stability gate. Default false — i.e. callers
// gate by default. Pass true inside polling loops where the gate
// fights the loop (each iteration would block waiting for stability
// even when the change we're polling for is the AX tree updating).
fast?: boolean;
}
// Fetch the live AX tree and convert into the walker's RawElement[]
// snapshot shape. By default gates on `waitForAxTreeStable` first —
// without it, the first read after a fresh page-load can return only
// the RootWebArea + shell (~4 nodes) even when the DOM has hundreds
// of interactive elements (Chromium populates AX async; see
// docs/learnings/test-harness-ax-tree-walker.md §1). Cost is ~800ms
// when already stable.
//
// Pass `{ fast: true }` inside polling loops — `openPill`'s
// post-click menuitem search and `clickMenuItem`'s click-when-it-
// arrives loop both want fast iterations after one upfront stability
// gate, not stability re-checked on every poll.
async function snapshotAx(
inspector: InspectorClient,
opts: SnapshotOpts = {},
): Promise<RawElement[]> {
if (!opts.fast) {
await waitForAxTreeStable(inspector, { minNodes: 1, timeoutMs: 10_000 });
}
const nodes: AxNode[] = await inspector.getAccessibleTree('claude.ai');
return axTreeToSnapshot(nodes);
}
// `snapshotAx` and the stability gate are now in `lib/ax.ts` —
// extracted there in session 13 once T26 had to redefine the same
// helper inline (two consumers = threshold-driven extraction). Page-
// objects below import via the lib aliases; consumers outside this
// file should reach for `lib/ax.ts` directly rather than re-importing
// through `lib/claudeai.ts`.
// One of the three top-level pills. Click is fire-and-forget — the
// router rerenders the tab body inline (no URL change on Code), so

View File

@@ -3,12 +3,11 @@ import { launchClaude } from '../lib/electron.js';
import { createIsolation, type Isolation } from '../lib/isolation.js';
import { captureSessionEnv } from '../lib/diagnostics.js';
import { retryUntil } from '../lib/retry.js';
import type { InspectorClient } from '../lib/inspector.js';
import {
type RawElement,
axTreeToSnapshot,
snapshotAx,
waitForAxTreeStable,
} from '../../explore/walker.js';
} from '../lib/ax.js';
// T26 — Routines page renders.
//
@@ -46,24 +45,10 @@ interface AxAnchorSnapshot {
matches: AxAnchorMatch[];
}
// Pull a flat AX snapshot via the same primitives lib/claudeai.ts uses.
// Inlined rather than importing a private helper from claudeai.ts —
// snapshotAx isn't exported there and adding a public wrapper for one
// runner is premature abstraction. `fast` skips the upfront stability
// gate so polling loops don't burn ~800ms per iteration.
async function snapshotAx(
inspector: InspectorClient,
opts: { fast?: boolean } = {},
): Promise<RawElement[]> {
if (!opts.fast) {
await waitForAxTreeStable(inspector, {
minNodes: 1,
timeoutMs: 10_000,
});
}
const nodes = await inspector.getAccessibleTree('claude.ai');
return axTreeToSnapshot(nodes);
}
// `snapshotAx` (and `waitForAxTreeStable`) come from `lib/ax.ts` —
// the shared AX-loading substrate. T26 was the second consumer to
// reach for the helper (after `lib/claudeai.ts`'s page-objects),
// which crossed the threshold for extraction in session 13.
// Find every interactive element whose role+accessibleName matches one
// of the supplied {role, name} pairs. Used both pre-click (to locate