Files
claude-desktop-debian/docs/learnings/test-harness-ax-tree-walker.md
Aaddrick 3506c14918 test(harness): add Linux compatibility test harness (#579)
Build out a Playwright-based regression-detection harness covering
the compat-matrix surfaces (KDE-W, KDE-X, GNOME, Sway, i3, Niri,
packaging formats). Adds:

- Planning + decision docs under docs/testing/ — README, matrix,
  runbook, automation, cases/ (11 case files), quick-entry-closeout
- Playwright scaffolding (config, tsconfig)
- 78 spec runners under tools/test-harness/src/runners/ — T## case-
  doc runners and S## distribution/smoke runners
- Substrate primitives in tools/test-harness/src/lib/: AX-tree
  loader (snapshotAx + waitForAxNode + axTreeToSnapshot), focus-
  shifter, eipc-registry, niri-native bridge, drag-drop bridge,
  electron-mocks, claudeai page-objects, inspector client

S03 (DEB Depends declared) and S04 (RPM Requires declared) ship
marked test.fail() — they're regression detectors for the case-doc
gap (deb.sh emits no Depends:, rpm.sh sets AutoReqProv: no), and
the expected-failure shape lets them report green on every host
until upstream packaging starts declaring runtime deps.

127 files, no runtime changes; harness is opt-in via
'cd tools/test-harness && npx playwright test'.

Co-authored-by: Claude <claude@anthropic.com>
2026-05-04 23:17:37 -04:00

6.8 KiB

Test-harness AX-tree walker — non-obvious traps

Notes from the v6 → v7 fingerprint migration that switched tools/test-harness/explore/walker.ts from a renderer-side document.querySelectorAll IIFE to Chromium's accessibility tree (Accessibility.getFullAXTree over CDP). All five gotchas below cost a wasted live-walk to find; capturing them here so the next person debugging a 0-entry inventory or a redrive cascade can skip the discovery loop.

1. Accessibility.enable is async; the first getFullAXTree lies

Inspector clients call target.debugger.sendCommand('Accessibility.enable') before the first getFullAXTree. Both calls return immediately, but Chromium populates the AX tree asynchronously — the very first read can return a tree containing only the RootWebArea and a generic shell (4 nodes total) even when the DOM has hundreds of interactive elements. The walker's existing waitForStable is a DOM-mutation-quiescence observer with a 1.5s ceiling; on claude.ai's SPA the DOM mutates constantly so waitForStable returns at the ceiling without the AX tree ever catching up.

Fix: waitForAxTreeStable polls getFullAXTree until two consecutive reads return the same node count. Called once before the seed snapshot (with minNodes: 20 to gate against the 4-node "still loading" case), once after each navigateTo in redrivePath, and baked into every snapshotSurface call (with minNodes: 1 for the post-click case where the tree is already populated).

Symptom you'll see: seed entries: 0. Walker exits with no inventory. Stderr says walker: AX tree settled at 4 nodes (or similar small number).

2. navigateTo(sameUrl) is a no-op; redrives carry prior state

The walker's navigateTo(url) short-circuits when currentUrl === url (per the original v6 implementation). Every BFS pop re-navigates to startUrl to replay the recorded path against a clean state, but when currentUrl already matches startUrl the navigation is skipped. Anything a prior drill left behind — open dialog, expanded sidebar, scrolled focus, route params — carries into the next redrive's snapshots. clickById then suffix-matches the requested fingerprint against a contaminated surface and silently fails to find elements that were absolutely on the seed surface.

Fix: redrivePath uses reloadPage(inspector) (which evals location.reload() in the renderer) instead of navigateTo(startUrl). The reload discards the React tree and forces a fresh mount even when the URL matches.

Symptom you'll see: the first one or two BFS items succeed, then every subsequent redrive fails with clickById: no element matches "<seed-id>" on current surface. The <seed-id> is a button you can verify with the DevTools console is visibly present.

3. claude.ai uses flat dialog>button[] and complementary>button[], not role=list

The v7 plan's isListRowChild check assumes list rows use ARIA list semantics (option/listitem inside listbox/list). claude.ai exposes the connect-apps marketplace as a dialog with ~80 plain button children (no list wrapper) and the cowork sidebar as a complementary landmark with ~70 plain button children. Without the heuristic those buttons literal-match by name → each gets a unique stable entry → the BFS queues each individually for drilling → inventory bloats from 32 to 442+ entries and most drills fail because the per-row buttons are virtualized.

Fix: isListRowChild extended in two ways. (a) LIST_ROW_ROLES includes button, LIST_ANCESTOR_ROLES includes group. (b) A sibling-count fallback fires when siblingTotal >= 15 regardless of ancestor role — sits well above realistic toolbar sizes (≤10) and well below the smallest claude.ai marketplace (~80). Step 3 (positional fallback) also gates on !isListRowChild so list rows fall through to step 4's instance collapse instead of fragmenting into per-index positionals that can't fold.

Symptom you'll see: dialog kind count balloons (>200). One surface dominates the surfaceBreakdown query in the inventory. Each marketplace card or sidebar row gets its own kind: structural entry with a slugified product name in the id-tail.

4. The more options for X per-row trigger needs its own shape

Cowork sidebar rows have a "⋮" menu next to each session whose aria-label is More options for <session title>. These don't match the cowork-session shape (which gates on status prefix), so even after cowork-session collapsed the session list, the sibling "More options for" buttons still emitted individually. Same for any future per-row action button claude.ai adds.

Fix: new INSTANCE_SHAPES entry row-more-options with regex /^More options for / and matching pattern. Generic enough to cover any per-row trigger that follows the <verb> for <row title> shape.

Symptom you'll see: after fixing (1)-(3), a fresh wave of redrive failures all matching more-options-for-X slugs.

5. Sidebar virtualization causes structural redrive misses; bump the threshold

claude.ai's cowork sidebar appears to virtualize the session list: each fresh page load exposes a slightly different subset of sessions in the AX tree (subset, not just ordering — actually different membership). The walker captures session N at seed time but on redrive after reloadPage session N may not be in the tree. Each miss counts toward MAX_CONSECUTIVE_LOOKUP_FAILURES, and a stretch of 25+ consecutive cowork-row redrives can blow through the original threshold without the renderer being meaningfully wedged.

Fix: threshold bumped 25 → 75. The timeout counter (still 5 strikes) gates against actual renderer hangs; the lookup-failure counter is more about "discovered DOM has drifted from seed", and on a virtualized list a generous threshold is correct. Subtree pruning (already in place) keeps the bursts from compounding by dropping queue items whose path shares the failed step's prefix.

Symptom you'll see: the walker aborts mid-walk with 25 consecutive redrive lookup failures and the failed ids all share a common ariaPath prefix (root.complementary.button-by-name.X).

Driver: prefer walk-isolated.ts over explore walk

npm run explore:walk connects to whatever Node inspector is on :9229 — i.e. the host Claude Desktop the user is currently using. That mutates the host profile (visited surfaces, navigation history, route changes) and races with the human at the keyboard.

tools/test-harness/explore/walk-isolated.ts mirrors what H05 / U01 do: kills any running host instance, copies auth into a tmpdir (createIsolation({ seedFromHost: true })), spawns a fresh Electron with isolated XDG_CONFIG_HOME, attaches the inspector via SIGUSR1, runs the walk, tears down. Same flag set as explore walk plus --no-seed for the rare case you want a fresh-sign-in run. Use it.