mirror of
https://github.com/aaddrick/claude-desktop-debian.git
synced 2026-05-17 00:26:21 +03:00
docs(testing): session 14 plan/inventory + rotate session 15 prompt
Add session 14 status entry to runner-implementation-plan.md (call- site migration + T16 fix verification + T17-stays-flaky verification). Rotate the followup prompt for session 15: PRIORITY shape is T17 investigation + potential `openPill` / `clickMenuItem` migration if the failure trace shows AX-polling-reachable cause; A / B / C unchanged from session 14 (still need debugger). Co-Authored-By: Claude <claude@anthropic.com>
This commit is contained in:
@@ -1,127 +1,127 @@
|
||||
# test-harness runner implementation — session 14 prompt
|
||||
# test-harness runner implementation — session 15 prompt
|
||||
|
||||
This file is meant to be **copied verbatim into a fresh Claude Code
|
||||
session** as the initial user message. Don't paraphrase it; the
|
||||
orchestration depends on the exact directives below.
|
||||
|
||||
You're picking up after a runner-implementation session that landed 1
|
||||
new primitive (`lib/ax.ts`) and NO new spec. Session 13 was a pivot:
|
||||
Phase 0 calibration found the debugger detached on the dev box (port
|
||||
9229 not listening — Claude was running but Developer → Enable Main
|
||||
Process Debugger had not been clicked), which blocked Categories A
|
||||
(operon-mode navigation probe) and C (schema-rev for
|
||||
`listRemotePluginsPage` / `listSkillFiles`) — both need runtime
|
||||
probing against a debugger-attached running Claude. Category B (Tier
|
||||
3 read-only reframes) ALSO effectively needed the debugger for the
|
||||
smoke-test investigation phase. Session 13 pivoted to the
|
||||
PRIORITY-flagged DOM unification primitive, which was tractable
|
||||
without the debugger because both consumer signals existed
|
||||
statically: `claudeai.ts` had a private `snapshotAx`, T26 had a
|
||||
duplicate inline copy explicitly noted as "premature abstraction at 1
|
||||
consumer", plus the user reported recurring AX-query flake. Coverage
|
||||
unchanged at 74/76 (97%) — primitive-only sessions don't move the
|
||||
spec count. Two commits on `docs/compat-matrix` expected (SHAs
|
||||
inserted after the test-harness commit lands — the user reviews and
|
||||
commits at the end of every session):
|
||||
call-site migration (no new spec, no new primitive). Session 14 was
|
||||
a flake-reduction session: Phase 0 calibration found the debugger
|
||||
detached on the dev box (port 9229 not listening — Claude was not
|
||||
running, or running but Developer → Enable Main Process Debugger had
|
||||
not been clicked), which blocked Categories A (operon-mode
|
||||
navigation probe), B (Tier 3 read-only reframes), and C (schema-rev
|
||||
for `listRemotePluginsPage` / `listSkillFiles`) — all needing runtime
|
||||
probing against debugger-attached Claude. Session 14 pivoted to the
|
||||
PRIORITY Category D (call-site migration to `waitForAxNode`), which
|
||||
was tractable without the debugger because the migration is pure
|
||||
shape-only refactor against existing `lib/ax.ts` substrate. Coverage
|
||||
unchanged at 74/76 (97%) — migration sessions don't move the spec
|
||||
count, but T16's pre-existing failure mode (`no AX-tree button with
|
||||
accessibleName="Code" found`) is fixed by the migration. Two commits
|
||||
on `docs/compat-matrix` expected (autonomous orchestration commits +
|
||||
pushes — the user reviews after the session):
|
||||
|
||||
- TBD — `test(harness): session 13 lib/ax.ts AX substrate primitive`
|
||||
(extracts `snapshotAx` from `claudeai.ts` private + T26 inlined
|
||||
duplicate; adds `waitForAxNode` / `waitForAxNodes` predicate-based
|
||||
polling helpers; re-exports `RawElement` / `AxNode` /
|
||||
`axTreeToSnapshot` / `waitForAxTreeStable` from `explore/walker.ts`
|
||||
so consumers stay inside `lib/`; refactors `claudeai.ts` and T26
|
||||
to consume the shared substrate).
|
||||
- TBD — `test(harness): session 14 migrate activateTab to
|
||||
waitForAxNode (no spec, coverage unchanged at 97%)`
|
||||
(migrates `activateTab` from one-shot snapshot to `waitForAxNode`
|
||||
with a configurable pre-click timeout; migrates
|
||||
`CodeTab.activate`'s post-click `retryUntil`-around-
|
||||
`findCompactPills` loop to `waitForAxNodes`; T16 passes 3/3 on
|
||||
KDE-W against the migrated form, was pre-existing-flaky on the
|
||||
baseline; T26 passes; T17 still pre-existing-flaky — verified by
|
||||
stash + retry).
|
||||
|
||||
The plan doc at
|
||||
[`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
|
||||
captures the tier classification and execution-time reclassifications.
|
||||
Its "Status (post-execution)" section is the source of truth for
|
||||
what's done and what's deferred — read **session 13** first, then
|
||||
**session 12**, then **session 11**, then **session 10**, then
|
||||
**session 9**, then **session 8**, then **session 7**, then **session
|
||||
6**, then **session 5**, then **session 4**, then **session 3**, then
|
||||
**session 2**, then **session 1** sub-sections.
|
||||
what's done and what's deferred — read **session 14** first, then
|
||||
**session 13**, then **session 12**, then **session 11**, then
|
||||
**session 10**, then **session 9**, then **session 8**, then **session
|
||||
7**, then **session 6**, then **session 5**, then **session 4**, then
|
||||
**session 3**, then **session 2**, then **session 1** sub-sections.
|
||||
|
||||
This session is a continuation, not a restart. Start by reading the
|
||||
plan doc's status sections.
|
||||
|
||||
### Big new findings from session 13
|
||||
### Big new findings from session 14
|
||||
|
||||
1. **Pre-existing T16 / T17 / T07 / S25 / S29-S31 flake confirmed
|
||||
on KDE-W against the unchanged baseline.** Running the full suite
|
||||
surfaced 12 failures, including T16 (CodeTab.activate: no AX-tree
|
||||
button with accessibleName="Code" found) and T17. Verified
|
||||
pre-existing by stashing the session-13 changes and re-running
|
||||
T16 — same failure. Session 13's primitive doesn't fix the existing
|
||||
flake; it lays groundwork. Future sessions can build flake-
|
||||
reduction patches against `lib/ax.ts`'s `waitForAxNode` (e.g.
|
||||
promote `activateTab`'s one-shot snapshot to a proper retry, or
|
||||
give T07's CSS-querySelector poll a more durable wait shape if
|
||||
that abstraction emerges).
|
||||
2. **`lib/ax.ts` is the new shared AX-tree substrate.** Surface:
|
||||
- `snapshotAx(inspector, opts)` — single AX read with the
|
||||
stability gate. `opts.fast` skips the gate for inside-poll
|
||||
callers (matches the existing `claudeai.ts`/T26 contract).
|
||||
- `waitForAxNode(inspector, predicate, opts)` — repeatedly
|
||||
snapshot the tree and return the first matching `RawElement`,
|
||||
null on timeout. Gates on stability once at the start
|
||||
(configurable), then iterates with `fast: true`. Built against
|
||||
the inline polling loops in `CodeTab.activate`, `openPill`,
|
||||
`clickMenuItem`, T26 pre/post-click anchor scans — but the
|
||||
existing call-sites are NOT migrated this session (their per-
|
||||
spec retry budgets are tuned and changing them speculatively
|
||||
risks flake). Future call-site migrations are tractable.
|
||||
- `waitForAxNodes(inspector, predicate, opts)` — same shape,
|
||||
returns every match. For consumers that want to enumerate.
|
||||
- Re-exports: `RawElement`, `AxNode`, `axTreeToSnapshot`,
|
||||
`waitForAxTreeStable` — so consumers stay inside `lib/`
|
||||
instead of reaching into `explore/walker.ts` directly.
|
||||
3. **The debugger-attachment precondition is binding.** Sessions 9
|
||||
through 12 did extensive runtime probing of the per-wc IPC
|
||||
1. **`activateTab` no-retry was the T16 failure mode.** Verified by
|
||||
stashing the migration and re-running T16 against the baseline —
|
||||
same `CodeTab.activate: no AX-tree button with accessibleName="Code"
|
||||
found` failure. The migration converts the pre-click snapshot from
|
||||
one-shot to a `waitForAxNode` poll, with the existing T16 budget
|
||||
(15s through `CodeTab.activate({ timeout })`) covering both the
|
||||
pre-click click-budget and the post-click pill poll. T16 passed
|
||||
3/3 in succession against the migrated form. Strong signal that
|
||||
"convert one-shot AX snapshots to `waitForAxNode` polling" is a
|
||||
high-leverage flake-reduction shape — this is the first migration
|
||||
that demonstrably fixed an existing failure.
|
||||
2. **T17 stays pre-existing-flaky.** T17 exercises the env-pill →
|
||||
Local → Select-folder → Open-folder chain via `openEnvPill` /
|
||||
`selectLocal` / `openFolderPicker`, which use `openPill` and
|
||||
`clickMenuItem` internally. Those weren't migrated this session
|
||||
(their post-click stability gates plus per-spec sleep budgets
|
||||
carry tuning the prompt explicitly cautioned against changing).
|
||||
T17's flake mode is unchanged-by-migration; future sessions can
|
||||
take it if budget tuning data warrants. The `openPill` while-loop
|
||||
on a successful menu render takes 100ms-per-poll-iteration; if the
|
||||
menu hasn't rendered within 5s, it returns `{ opened: false,
|
||||
items: [] }`. Migrating to `waitForAxNode` would flatten the loop
|
||||
shape but doesn't obviously change the outcome, so the migration
|
||||
wasn't worth the budget-tuning risk this session.
|
||||
3. **The debugger-attachment precondition is still binding.**
|
||||
Sessions 9-12 did extensive runtime probing of the per-wc IPC
|
||||
registry against the user's debugger-attached Claude. Without
|
||||
that probing, Categories A / B / C in this prompt are blocked at
|
||||
the smoke-test phase. If the user hasn't clicked Developer →
|
||||
Enable Main Process Debugger before the session starts, port 9229
|
||||
is closed and the categories pivot to either documentation work
|
||||
or the call-site-migration shape that doesn't need runtime
|
||||
probing. Phase 0 must check `ss -tln | grep ':9229'` (or `curl
|
||||
--max-time 2 http://127.0.0.1:9229/json`) before fanning out.
|
||||
or further call-site migration. Phase 0 must check `ss -tln |
|
||||
grep ':9229'` (or `curl --max-time 2 http://127.0.0.1:9229/json`)
|
||||
before fanning out.
|
||||
4. **The reframe pool remains essentially exhausted.** Same status
|
||||
as session 12 — every Tier 1 fingerprint with a tractable runtime
|
||||
sibling has been promoted. The remaining options are now: (a)
|
||||
call-site migration to `waitForAxNode` for flake reduction, (b)
|
||||
operon-mode navigation probe (still needs debugger), (c) schema-
|
||||
rev for `listRemotePluginsPage` / `listSkillFiles` (still needs
|
||||
debugger), (d) Tier 3 read-only reframes (most need user-account
|
||||
state). The natural next-session shape is (a) — flake reduction
|
||||
builds on session 13's primitive and doesn't need the debugger.
|
||||
as sessions 12-13 — every Tier 1 fingerprint with a tractable
|
||||
runtime sibling has been promoted. The remaining options are now:
|
||||
(a) further call-site migration to `waitForAxNode` for flake
|
||||
reduction (`openPill` / `clickMenuItem` / T26's pre-click
|
||||
`retryUntil` — though T26's needs a `context-was-destroyed`
|
||||
exception swallow), (b) operon-mode navigation probe (still needs
|
||||
debugger), (c) schema-rev for `listRemotePluginsPage` /
|
||||
`listSkillFiles` (still needs debugger), (d) Tier 3 read-only
|
||||
reframes (most need user-account state). Session 14 demonstrated
|
||||
migration can deliver a measurable bug-fix outcome; that
|
||||
continues to be the highest-leverage shape when the debugger is
|
||||
closed.
|
||||
|
||||
### Authoritative reference
|
||||
|
||||
Read these in order before fanning out:
|
||||
|
||||
- [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
|
||||
— tier classification + status section. Read **session 13**, then
|
||||
**session 12**, then **session 11**, **session 10**, **session 9**,
|
||||
**session 8**, **session 7**, **session 6**, **session 5**, **session
|
||||
4**, **session 3**, **session 2**, then **session 1** "Status (post-
|
||||
execution)" sub-sections. The Tier-3 list (search for "## Tier 3")
|
||||
is the candidate pool for any further reframes.
|
||||
— tier classification + status section. Read **session 14**, then
|
||||
**session 13**, **session 12**, **session 11**, **session 10**,
|
||||
**session 9**, **session 8**, **session 7**, **session 6**,
|
||||
**session 5**, **session 4**, **session 3**, **session 2**, then
|
||||
**session 1** "Status (post-execution)" sub-sections. The Tier-3
|
||||
list (search for "## Tier 3") is the candidate pool for any further
|
||||
reframes.
|
||||
- [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
|
||||
— runner conventions, the now-74-spec inventory, primitives in
|
||||
`lib/`, isolation defaults, the CDP-gate workaround, the eipc
|
||||
note, and the new `lib/ax.ts` substrate (session 13 addition;
|
||||
consumer list is `claudeai.ts` page-objects + T26).
|
||||
note, and `lib/ax.ts` substrate (session 13 addition; session 14
|
||||
migrated `activateTab` + `CodeTab.activate`'s post-click pill
|
||||
poll to use it).
|
||||
- [`docs/testing/cases/README.md`](cases/README.md) — case-doc
|
||||
structure and the four anchor scopes.
|
||||
- [`tools/test-harness/src/lib/`](../../tools/test-harness/src/lib/)
|
||||
— the existing primitives. Session 13 added `lib/ax.ts`; surface
|
||||
is `snapshotAx` / `waitForAxNode` / `waitForAxNodes` plus re-
|
||||
exports of `RawElement` / `AxNode` / `axTreeToSnapshot` /
|
||||
`waitForAxTreeStable`. The session 8 eipc surface
|
||||
(`getEipcChannels` / `findEipcChannel` / `findEipcChannels` /
|
||||
`waitForEipcChannel` / `waitForEipcChannels` / `invokeEipcChannel`
|
||||
on `lib/eipc.ts`) is unchanged.
|
||||
— the existing primitives. `lib/ax.ts` surface is `snapshotAx` /
|
||||
`waitForAxNode` / `waitForAxNodes` plus re-exports. The session 8
|
||||
eipc surface (`getEipcChannels` / `findEipcChannel` /
|
||||
`findEipcChannels` / `waitForEipcChannel` /
|
||||
`waitForEipcChannels` / `invokeEipcChannel` on `lib/eipc.ts`) is
|
||||
unchanged.
|
||||
- [`tools/test-harness/eipc-registry-probe.ts`](../../tools/test-harness/eipc-registry-probe.ts)
|
||||
— the session 7 read-only registry probe. Re-run against a
|
||||
debugger-attached Claude (`Developer → Enable Main Process
|
||||
@@ -131,13 +131,16 @@ Read these in order before fanning out:
|
||||
and run N candidate read-sides through M arg shapes; deleted
|
||||
after.
|
||||
- [`tools/test-harness/src/runners/`](../../tools/test-harness/src/runners/)
|
||||
— every existing spec is a template. Notable session 13
|
||||
— every existing spec is a template. Notable session 14
|
||||
candidates for follow-up:
|
||||
- `T26_routines_page_renders.spec.ts` — first consumer of
|
||||
`lib/ax.ts`'s exported `snapshotAx` (refactored from inline).
|
||||
Other AX-using specs (T16, T17, H05) still call through
|
||||
`claudeai.ts` page-objects which use the shared substrate
|
||||
transparently.
|
||||
- `T17_folder_picker.spec.ts` — the next test that would benefit
|
||||
from `openPill` / `clickMenuItem` migration. Pre-existing
|
||||
flake; current failure is a 60s timeout in the
|
||||
openEnvPill/selectLocal/openFolderPicker chain.
|
||||
- `T26_routines_page_renders.spec.ts` — has a pre-click
|
||||
`retryUntil` block with `context-was-destroyed` exception
|
||||
handling that could become a `waitForAxNode` call once the
|
||||
primitive grows error-class options.
|
||||
- [`docs/testing/cases/*.md`](cases/) — the spec each runner
|
||||
asserts. The **Code anchors:** field tells you exactly where
|
||||
upstream implements the feature.
|
||||
@@ -146,40 +149,47 @@ Read these in order before fanning out:
|
||||
|
||||
**Realistic ceiling: ~1 new spec OR one substantive flake-reduction
|
||||
deliverable OR one investigation.** Sessions 9-12 each landed 1-2
|
||||
specs; session 13 landed only a primitive (debugger blocked).
|
||||
Coverage at 74/76 means the test budget naturally shifts toward
|
||||
either (a) flake reduction against `lib/ax.ts`'s primitive, (b)
|
||||
investigation that requires the debugger and was deferred from
|
||||
sessions 12-13, or (c) Tier 3 read-only reframes that the harness
|
||||
can construct from existing `seedFromHost` state.
|
||||
specs; session 13 landed only a primitive (debugger blocked); session
|
||||
14 landed only a migration (debugger blocked). Coverage at 74/76
|
||||
means the test budget naturally shifts toward either (a) further flake
|
||||
reduction by extending the migration shape, (b) investigation that
|
||||
requires the debugger and was deferred from sessions 12-14, or (c)
|
||||
Tier 3 read-only reframes that the harness can construct from
|
||||
existing `seedFromHost` state.
|
||||
|
||||
**Phase 0 MUST check the debugger BEFORE picking a category.** Run
|
||||
`ss -tln 2>/dev/null | grep ':9229'` (or
|
||||
`curl --max-time 2 http://127.0.0.1:9229/json`). If port 9229 is not
|
||||
listening, Categories A and C are hard-blocked. Pivot to D or B.
|
||||
|
||||
#### **PRIORITY: Call-site migration to `lib/ax.ts`'s
|
||||
`waitForAxNode` for flake reduction.** Session 13 landed the
|
||||
substrate; this session can promote the inline retry loops in
|
||||
`claudeai.ts` (`activateTab` is the strongest candidate — it does a
|
||||
one-shot snapshot with no retry, which is exactly the failure mode
|
||||
T16 hits). Smaller-scope candidates: `findCompactPills` (one-shot
|
||||
snapshot, no retry — same shape as `activateTab`), `openPill`'s
|
||||
post-click while-loop, `clickMenuItem`'s while-loop. Each migration
|
||||
is a localized refactor; verify by running the affected specs
|
||||
(T16/T17/T26/H05) and checking pass rate. Don't speculatively
|
||||
change the budget defaults — match the existing per-spec retry
|
||||
budgets so the migration is shape-only. **If this is what session
|
||||
14 ships, that's a strictly higher-impact outcome than another Tier
|
||||
2 / Tier 3 reframe — flake reduction touches every existing AX-
|
||||
using spec.** Doesn't need the debugger.
|
||||
#### **PRIORITY: Investigate why T17 stays flaky and decide on a
|
||||
migration-or-fix path.** Session 14's migration fixed T16's pre-
|
||||
existing failure mode. T17 is the next-clearest pre-existing-flaky
|
||||
spec on KDE-W; it shares plumbing with T16 (`CodeTab` → AX-driven
|
||||
clicks) but goes deeper through `openEnvPill` / `selectLocal` /
|
||||
`openFolderPicker`. The session 14 migration does NOT reach into
|
||||
those (they use `openPill` + `clickMenuItem`, both of which carry
|
||||
post-click stability gates and per-iteration sleep loops). The
|
||||
investigation: (1) read T17's failure trace from the most recent
|
||||
session-14 stashed run (under `tools/test-harness/results/local/
|
||||
test-output/T17_folder_picker-T17-—-Folder-picker-opens/`), (2)
|
||||
classify the failure (env-pill probe? Local item? Select-folder
|
||||
pill? Open-folder click?), (3) decide if (a) `openPill` migration
|
||||
to `waitForAxNode` would reach it, or (b) the budget defaults need
|
||||
tuning, or (c) the failure is from something orthogonal to AX
|
||||
polling. If (a), ship the migration. If (b), document the budget
|
||||
mismatch in plan-doc. If (c), defer to a future session with a
|
||||
clearer signal. **If this is what session 15 ships, that's a
|
||||
strictly higher-impact outcome than another Tier 2 / Tier 3 reframe
|
||||
— flake reduction touches every existing AX-using spec.** Doesn't
|
||||
need the debugger.
|
||||
|
||||
Three categories — pick ONE as the main bet, treat the others as
|
||||
fallback if the main bet hits an early blocker:
|
||||
|
||||
| # | Tests | Source | Notes |
|
||||
|---|---|---|---|
|
||||
| **D** call-site migration to `waitForAxNode` | `claudeai.ts` page-objects + T26 + future Code-tab AX work | `lib/ax.ts` (session 13 primitive) | The PRIORITY shape this session. Promote `activateTab`'s one-shot snapshot to use `waitForAxNode`; same for `findCompactPills`. Validate by re-running T16 / T17 / T26 / H05 against the migrated form. Doesn't need the debugger. Risk: changing the retry shape can introduce new flake if the budget defaults don't match the existing per-spec tuning — keep migrations shape-only, no budget changes. |
|
||||
| **D** further call-site migration / T17 investigation | T17 / `claudeai.ts` `openPill` + `clickMenuItem` | `lib/ax.ts` (session 13 primitive) | The PRIORITY shape this session. Read T17's failure trace, decide if `openPill` migration would fix it, ship the migration if so. Same shape-only refactor risk as session 14: keep the per-spec retry budgets matching the existing tuning. Doesn't need the debugger. **Risk:** `openPill` and `clickMenuItem` carry post-click stability gates that `waitForAxNode` already covers via `stabilityGate: true`, so the migration shape should slot in cleanly — but each spec's overall budget needs verification. |
|
||||
| **A** operon-mode navigation probe | n/a (investigation) + maybe small Tier 2 reframe | new probe + bundle grep for operon URL routes | Session 10 confirmed `OperonBootstrap.ensure` registers eagerly but the other 21 wrapper-exposed operon interfaces remain registry-unconfirmed. Outputs: either an operon-mode URL form recovered from the bundle (search for `operon`-keyed routes in `claude.ai/...` paths) plus a registry re-probe after navigation, OR a deferral note explaining why operon scope can't be reached without an operon-mode entry. **Needs debugger-attached Claude on port 9229.** |
|
||||
| **B** Tier 3 read-only reframes | Pick from the Tier 3 list | T33c / T35b / T37b template + bundle grep | The Tier 3 list is full of login-required flows; some have read-only entry points that the harness CAN construct. Candidates: T22's `getPrChecks` read-side might accept a non-existent PR number / dry-run mode; T15's OAuth surface has read-only state queries. Most need the user-account-scoped state to fail-fast with a clean error rather than a real network roundtrip — investigate first. **Needs debugger for smoke-test verification.** |
|
||||
| **C** Schema-rev for `listRemotePluginsPage` / `listSkillFiles` | Bundle grep | session 9 schema-rev pattern | Both methods rejected every smoke-tested arg shape during session 12's investigation. `listRemotePluginsPage` needs `limit: number` at position 0 (rejection: `Argument "limit" at position 0 ...`); `listSkillFiles` needs both `pluginId` and `skillName` (rejection: `Argument "skillName" at position 1 ...`). Bundle-grep on the rejection literals → resolve the schema → ship a narrowly-scoped Tier 2 invocation if it unblocks a case-doc claim. **Needs debugger to verify the recovered schema.** |
|
||||
@@ -189,27 +199,35 @@ only session that audits the existing AX call-sites and proposes a
|
||||
migration plan (without shipping) is also acceptable — pre-work for
|
||||
a future session that DOES land the migration.
|
||||
|
||||
#### Category D — call-site migration to `waitForAxNode`
|
||||
#### Category D — further call-site migration / T17 investigation
|
||||
|
||||
The plan: promote inline AX retry loops in `claudeai.ts` to use
|
||||
`waitForAxNode` from `lib/ax.ts`.
|
||||
The plan: investigate T17's pre-existing flake, decide on a fix path,
|
||||
ship if a `waitForAxNode`-shaped migration of `openPill` /
|
||||
`clickMenuItem` would reach it.
|
||||
|
||||
1. **Audit the call-sites.** `activateTab` does one-shot snapshot,
|
||||
no retry — direct candidate. `findCompactPills` same. `openPill`
|
||||
post-click while-loop and `clickMenuItem` while-loop both do
|
||||
snapshot+filter+sleep — convert to `waitForAxNode` /
|
||||
`waitForAxNodes` with the existing budget. T26's pre/post-click
|
||||
`retryUntil` blocks are also direct candidates.
|
||||
2. **Migrate one call-site at a time.** Run the affected specs after
|
||||
each migration (T16 / T17 / T26 / H05). Don't migrate all at
|
||||
once — one bad budget change can cascade across multiple specs.
|
||||
3. **Don't change the retry budgets.** The existing per-spec timeouts
|
||||
are tuned (CodeTab.activate uses 5s default but T16 passes 15s);
|
||||
match them when migrating.
|
||||
4. **Don't add new functionality.** This is a shape-only refactor.
|
||||
If a migration reveals a budget that's clearly wrong (e.g.
|
||||
`activateTab` has NO retry today, which is the T16 failure mode),
|
||||
that's a small bug-fix the migration corrects — but document it.
|
||||
1. **Read T17's most recent failure trace.** Either the session-14
|
||||
stashed-baseline trace (under `tools/test-harness/results/local/
|
||||
test-output/T17_folder_picker-T17-—-Folder-picker-opens/`) or run
|
||||
T17 fresh against the post-session-14 form. Classify the failure:
|
||||
- openEnvPill timeout? (would suggest `openPill` migration)
|
||||
- selectLocal timeout? (would suggest `clickMenuItem` migration)
|
||||
- openFolderPicker chain timeout? (suggests deeper issue)
|
||||
- Some other failure?
|
||||
2. **If `openPill` migration would reach the failure**, migrate it.
|
||||
The shape: replace the post-click while-loop with
|
||||
`waitForAxNodes` filtered to MENU_ITEM_ROLES, with the existing
|
||||
`timeout` parameter as `timeoutMs`. Keep the upfront
|
||||
`waitForAxTreeStable` gate or pass `stabilityGate: true` to
|
||||
`waitForAxNodes`. Verify with T17 (or the originally-affected
|
||||
spec).
|
||||
3. **If `clickMenuItem` migration would reach the failure**, same
|
||||
shape. Replace the while-loop with `waitForAxNode` filtered on
|
||||
role + textPattern, with the existing `timeout` as `timeoutMs`.
|
||||
4. **If the failure is orthogonal to AX polling** (e.g. environmental,
|
||||
timing race outside the AX surface, dialog mock not installing),
|
||||
document and defer.
|
||||
|
||||
Doesn't need the debugger.
|
||||
|
||||
#### Category A — operon-mode navigation probe
|
||||
|
||||
@@ -275,11 +293,11 @@ consumer.
|
||||
|
||||
#### Main-side `invokeEipcChannel` fallback (NOT recommended this session)
|
||||
|
||||
Same status as sessions 8-13 — wait for a real consumer.
|
||||
Same status as sessions 8-14 — wait for a real consumer.
|
||||
|
||||
#### Launch event-subscription primitive (NOT recommended this session)
|
||||
|
||||
Same status as sessions 11-13 — wait for a real consumer.
|
||||
Same status as sessions 11-14 — wait for a real consumer.
|
||||
|
||||
#### `waitForRenderedSurface` registry (NOT recommended this session)
|
||||
|
||||
@@ -295,7 +313,7 @@ not AX). Wait for a second consumer before extracting.
|
||||
|
||||
### Constraints to respect (don't violate)
|
||||
|
||||
These are unchanged from sessions 1-13 and still load-bearing:
|
||||
These are unchanged from sessions 1-14 and still load-bearing:
|
||||
|
||||
- **Default isolation** unless the spec needs otherwise. Use
|
||||
`seedFromHost: true` for any test that depends on authenticated
|
||||
@@ -327,7 +345,18 @@ These are unchanged from sessions 1-13 and still load-bearing:
|
||||
`snapshotAx` for one-shot reads, `waitForAxNode` /
|
||||
`waitForAxNodes` for predicate-based polling. Don't reach into
|
||||
`explore/walker.ts` directly — re-exports go through `lib/ax.ts`.
|
||||
Consumers in session 13: `lib/claudeai.ts` page-objects + T26.
|
||||
Consumers in session 14: `lib/claudeai.ts`'s `activateTab` +
|
||||
`CodeTab.activate` post-click pill poll (migrated from one-shot
|
||||
/ hand-rolled retryUntil), plus T26.
|
||||
- **For call-site migrations to `waitForAxNode`: keep the per-spec
|
||||
retry budgets matching the existing tuning.** Session 14
|
||||
finding. The defaults in `lib/ax.ts` (`timeoutMs: 5000`,
|
||||
`intervalMs: 200`) are reasonable starting values, but any
|
||||
caller with a known per-spec budget should pass it through. The
|
||||
one acceptable bug-fix during migration is when the existing
|
||||
call-site had NO retry at all (e.g. `activateTab`'s pre-click
|
||||
one-shot snapshot) — adding a budget is the fix the migration
|
||||
delivers, and the prompt explicitly authorized it.
|
||||
- **`lib/input.ts` is X11-only.** Strict gate.
|
||||
- **`lib/input-niri.ts` is Niri-only.** Strict gate.
|
||||
- **Don't speculate on `lib/input-wayland.ts` dispatcher.**
|
||||
@@ -351,9 +380,10 @@ These are unchanged from sessions 1-13 and still load-bearing:
|
||||
- **Tabs in TS, ~80-char wrap as the existing files do.**
|
||||
- **Don't break existing runners.** `npm run typecheck` must stay
|
||||
clean. H01-H05 are the canaries; `npm test` must still pass them
|
||||
after every commit. Note that T16/T17/T07/S25/S29-S31/S04 etc.
|
||||
after every commit. Note that T17/T07/S25/S29-S31/S04 etc.
|
||||
are pre-existing-flaky on KDE-W per session 13's full-suite run
|
||||
— they're NOT canaries; baseline failures don't block work.
|
||||
(T16 fixed by session 14) — they're NOT canaries; baseline
|
||||
failures don't block work.
|
||||
- **Always grep the installed asar** to verify a fingerprint
|
||||
string is present.
|
||||
- **For mock-then-call: the helper goes in
|
||||
@@ -371,13 +401,14 @@ These are unchanged from sessions 1-13 and still load-bearing:
|
||||
`curl --max-time 2 http://127.0.0.1:9229/json`). If port 9229 is
|
||||
open, A / B / C are tractable; if closed, pivot to D or
|
||||
documentation-only.
|
||||
3. Read the plan doc's "Status (post-execution)" session 13 section,
|
||||
then read `lib/ax.ts`'s API + `T26` and `claudeai.ts`'s
|
||||
integration. Confirm you understand the `snapshotAx` /
|
||||
`waitForAxNode` / `waitForAxNodes` surface.
|
||||
3. Read the plan doc's "Status (post-execution)" session 14 section,
|
||||
then read `lib/ax.ts`'s API + `lib/claudeai.ts`'s post-session-14
|
||||
migration shape. Confirm you understand the `waitForAxNode` /
|
||||
`waitForAxNodes` consumer pattern.
|
||||
4. Pick ONE Category as the main bet:
|
||||
- **D** (PRIORITY when debugger is closed): pick 1-2 call-sites
|
||||
in `claudeai.ts` to migrate, list which.
|
||||
- **D** (PRIORITY when debugger is closed): read T17's failure
|
||||
trace; classify the failure; decide if `openPill` /
|
||||
`clickMenuItem` migration would reach it.
|
||||
- **A**: bundle grep + per-URL navigation + registry re-probe.
|
||||
- **B**: pick a Tier 3 candidate, smoke-test the read-side, decide
|
||||
ship or defer.
|
||||
@@ -390,9 +421,9 @@ Don't fan out.
|
||||
|
||||
#### Phase 1 — fan-out batch
|
||||
|
||||
For Category D (call-site migration):
|
||||
- Single subagent migrates 1-2 call-sites in `claudeai.ts` to use
|
||||
`waitForAxNode`. Verify by running T16 / T17 / T26 / H05.
|
||||
For Category D (further migration / T17 investigation):
|
||||
- Single subagent reads T17's trace, classifies, ships the migration
|
||||
if applicable. Verify by running T16 / T17 / T26 / H05.
|
||||
|
||||
For Category A (operon investigation):
|
||||
- Single subagent does bundle-grep for operon URL routes + per-URL
|
||||
@@ -409,7 +440,7 @@ For Category C (schema-rev):
|
||||
against the user's debugger-attached running Claude.
|
||||
|
||||
Cap at ~1 spec OR ~1 primitive migration total — same scope as
|
||||
sessions 9-13.
|
||||
sessions 9-14.
|
||||
|
||||
#### Per-subagent prompt shape
|
||||
|
||||
@@ -423,11 +454,13 @@ Read in order:
|
||||
the most-recent-template that fits)
|
||||
- tools/test-harness/src/runners/<closest-template>.spec.ts
|
||||
- tools/test-harness/src/lib/ (the primitives you'll reuse —
|
||||
including session 13's `lib/ax.ts`)
|
||||
including session 13's `lib/ax.ts` and session 14's migration
|
||||
examples in `lib/claudeai.ts`)
|
||||
- CLAUDE.md (project conventions)
|
||||
|
||||
Write tools/test-harness/src/runners/<TARGET>_short_name.spec.ts
|
||||
[ AND/OR tools/test-harness/src/lib/<NEW-PRIMITIVE>.ts ].
|
||||
[ AND/OR tools/test-harness/src/lib/<NEW-PRIMITIVE>.ts
|
||||
AND/OR edits to tools/test-harness/src/lib/claudeai.ts ].
|
||||
|
||||
[per-task specifics: pattern (seedFromHost / mock-then-call /
|
||||
asar fingerprint / shared isolation / new-primitive-build /
|
||||
@@ -449,7 +482,7 @@ If the target isn't reasonable to implement (anchors don't resolve
|
||||
to anything assertable, the test depends on state you can't
|
||||
construct, the existing primitives don't cover the surface), DO
|
||||
NOT write a stub. Report under Open questions and stop. Sessions
|
||||
1-13 had cumulative ~17 "stop and report" outcomes that were the
|
||||
1-14 had cumulative ~17 "stop and report" outcomes that were the
|
||||
right call.
|
||||
|
||||
Report shape (~150 words):
|
||||
@@ -492,7 +525,7 @@ After fan-out returns:
|
||||
|
||||
### Self-correction loop
|
||||
|
||||
Same as sessions 1-13:
|
||||
Same as sessions 1-14:
|
||||
|
||||
1. Subagent typecheck failure → re-spawn with explicit fix
|
||||
instruction.
|
||||
@@ -506,10 +539,11 @@ Same as sessions 1-13:
|
||||
5. Migration breaks an existing spec → roll back the migration; the
|
||||
per-spec retry budget was load-bearing and the primitive
|
||||
defaults didn't match. Document the budget mismatch in plan-doc.
|
||||
6. **Carry-over from session 5/6/7/8/9/10/11/12/13:** If the chosen
|
||||
Category's investigation doesn't resolve / requires schema-rev
|
||||
that exceeds budget after 2-3 approaches, STOP. Don't keep
|
||||
digging — pivot to a fallback Category. Document what was tried.
|
||||
6. **Carry-over from session 5/6/7/8/9/10/11/12/13/14:** If the
|
||||
chosen Category's investigation doesn't resolve / requires
|
||||
schema-rev that exceeds budget after 2-3 approaches, STOP. Don't
|
||||
keep digging — pivot to a fallback Category. Document what was
|
||||
tried.
|
||||
7. **Carry-over from session 10:** If a registration probe surfaces
|
||||
"registered but uninvocable", document and defer rather than
|
||||
building the main-side fallback speculatively.
|
||||
@@ -543,8 +577,9 @@ Stop and write the final report when one of:
|
||||
spec says, mark it as Tier 3 / blocked / primitive-gap and
|
||||
don't write a placeholder.
|
||||
- **Don't break existing runners.** H01-H05 are the canaries.
|
||||
T16 / T17 / T07 / S25 / S29-S31 are pre-existing-flaky on KDE-W
|
||||
per session 13's full-suite run — those are NOT canaries.
|
||||
T17 / T07 / S25 / S29-S31 are pre-existing-flaky on KDE-W
|
||||
per session 13's full-suite run (T16 fixed by session 14) —
|
||||
those are NOT canaries.
|
||||
- **Don't restructure `lib/`** beyond targeted additions.
|
||||
Premature abstractions are wrong abstractions.
|
||||
- **Don't run destructive Tier 3 tests** that write to the user's
|
||||
@@ -569,7 +604,8 @@ Stop and write the final report when one of:
|
||||
this — wait for a third consumer with a specific named surface.
|
||||
- **Don't change the existing per-spec retry budgets when migrating
|
||||
to `waitForAxNode`.** The budgets are tuned. Migration is shape-
|
||||
only.
|
||||
only — except when the call-site has NO retry at all (the
|
||||
session-14-authorized bug-fix shape).
|
||||
- **Don't reach into `explore/walker.ts` for AX types/helpers.**
|
||||
`lib/ax.ts` re-exports `RawElement` / `AxNode` /
|
||||
`axTreeToSnapshot` / `waitForAxTreeStable` — use those.
|
||||
@@ -579,7 +615,7 @@ Stop and write the final report when one of:
|
||||
### Final report format
|
||||
|
||||
```markdown
|
||||
## Runner implementation summary (session 14)
|
||||
## Runner implementation summary (session 15)
|
||||
|
||||
- Main-bet category: D | A | B | C
|
||||
- Specs landed: N
|
||||
@@ -652,6 +688,11 @@ git diff --stat
|
||||
`snapshotAx` for one-shot reads. Re-exports keep
|
||||
`explore/walker.ts` types accessible without crossing the
|
||||
lib/explore boundary.
|
||||
- **For call-site migrations to `waitForAxNode` (session 14
|
||||
finding):** keep per-spec retry budgets matching the existing
|
||||
tuning. Migration is shape-only EXCEPT when the call-site had
|
||||
NO retry at all — adding a budget is the bug-fix the migration
|
||||
delivers.
|
||||
- **For asar fingerprints: ALWAYS grep the installed asar
|
||||
first.** Build-reference is beautified; the bundle is
|
||||
minified.
|
||||
|
||||
@@ -18,6 +18,116 @@ work begins.
|
||||
|
||||
## Status (post-execution)
|
||||
|
||||
**Shipped session 14 (1 call-site migration, no new spec):**
|
||||
`activateTab` and `CodeTab.activate` in `lib/claudeai.ts` migrated
|
||||
from hand-rolled retry loops to session 13's `lib/ax.ts` substrate.
|
||||
This is a flake-reduction session — the priority shape called out in
|
||||
session 13's followup as the natural next step once the substrate
|
||||
landed. Phase 0 calibration found the debugger detached on the dev
|
||||
box (port 9229 not listening), which blocked Categories A / B / C
|
||||
(operon-mode navigation probe + Tier 3 read-only reframes + schema-
|
||||
rev for `listRemotePluginsPage` / `listSkillFiles` — all needing
|
||||
runtime probing against debugger-attached Claude). The PRIORITY
|
||||
Category D (call-site migration) was the highest-impact deliverable
|
||||
that didn't require the debugger.
|
||||
|
||||
Coverage stays at 74/76 (97%) — migration session, no spec landed.
|
||||
The matrix coverage doesn't reflect call-site migrations; those show
|
||||
up as flake-reduction in existing specs (T16's pre-existing `no
|
||||
AX-tree button with accessibleName="Code" found` failure mode is
|
||||
fixed by session 14's migration).
|
||||
|
||||
Two commits on `docs/compat-matrix` expected (the orchestration
|
||||
directive supersedes "the user reviews and commits" — autonomous
|
||||
commit + push at end of session):
|
||||
|
||||
- TBD — `test(harness): session 14 migrate activateTab to
|
||||
waitForAxNode (no spec, coverage unchanged at 97%)`
|
||||
(migrates `activateTab` from one-shot snapshot to
|
||||
`waitForAxNode` with a configurable pre-click timeout; migrates
|
||||
`CodeTab.activate`'s post-click `retryUntil`-around-
|
||||
`findCompactPills` loop to `waitForAxNodes`; T16 passes 3/3 on
|
||||
KDE-W against the migrated form, was pre-existing-flaky on the
|
||||
baseline; T26 still passes (regression check); T17 still pre-
|
||||
existing-flaky (verified by stash + retry — failure shape
|
||||
unchanged-by-migration).
|
||||
|
||||
Session 14 findings + reclassifications:
|
||||
|
||||
- **T16 fix landed.** Session 13 documented T16 as pre-existing-
|
||||
flaky on KDE-W with the failure mode `CodeTab.activate: no
|
||||
AX-tree button with accessibleName="Code" found`. Verified by
|
||||
stashing session 13's changes and re-running T16 against the
|
||||
baseline — same failure. Session 14's migration converts the
|
||||
pre-click `activateTab` from a one-shot AX snapshot into a
|
||||
`waitForAxNode` poll. The Code button is now waited-for up to
|
||||
the caller's budget (T16 passes 15s through `CodeTab.activate`)
|
||||
rather than checked-once. T16 passed 3/3 in succession against
|
||||
the migrated form.
|
||||
- **`activateTab` API change is additive.** New optional `opts:
|
||||
{ timeout?: number }` parameter; default 5000ms matches the
|
||||
`lib/ax.ts` defaults. Existing callers (just `CodeTab.activate`)
|
||||
pass through their own timeout. No breaking shape change to
|
||||
return type or first/second positional args.
|
||||
- **`CodeTab.activate` post-click loop migrated.** The hand-rolled
|
||||
`retryUntil(async () => { const pills = await
|
||||
findCompactPills(...); return pills.length > 0 ? pills : null; },
|
||||
{ timeout, interval: 200 })` block is structurally identical to
|
||||
`waitForAxNodes` with the compact-pill predicate inlined. The
|
||||
predicate (role: 'button' + hasPopup: 'menu' + non-empty
|
||||
accessibleName + not a `^More options for ` row trigger) is
|
||||
copy-pasted from `findCompactPills` to keep the page-object
|
||||
free-standing without changing observable shape. `waitForAxNodes`
|
||||
carries the existing 200ms interval and overall budget through
|
||||
via `intervalMs` / `timeoutMs`.
|
||||
- **`findCompactPills` not migrated.** It's used in three call-
|
||||
sites: (a) inside `CodeTab.activate`'s formerly-hand-rolled
|
||||
retry — migrated; (b) T16's diagnostic capture on failure
|
||||
(line 91, expects fail-fast / wants whatever's currently on the
|
||||
page); (c) T16's post-activate diagnostic (already-stable, one-
|
||||
shot-by-design). Migrating `findCompactPills` itself would push
|
||||
unwanted retry latency into the diagnostic path, so the helper
|
||||
stays a one-shot snapshot — only the retry shape moved into
|
||||
`CodeTab.activate`.
|
||||
- **`openPill` / `clickMenuItem` not migrated.** Both have
|
||||
post-click stability gates + sleep-based polling loops that
|
||||
could in principle be `waitForAxNode`-shaped, but each carries
|
||||
per-spec budget tuning (T17 / openFolderPicker chain uses
|
||||
`openPill { timeout: 1500 }` and `clickMenuItem { timeout:
|
||||
1500 }` defaults) that the prompt explicitly cautions against
|
||||
changing speculatively. The migration was scoped to the
|
||||
highest-impact call-site (the T16 fix) plus the cleanest shape
|
||||
match (`CodeTab.activate`'s post-click pill poll). Future
|
||||
sessions can take `openPill` / `clickMenuItem` if a third
|
||||
consumer signals.
|
||||
- **T17 unchanged-by-migration.** T17 was reported pre-existing-
|
||||
flaky on KDE-W per session 13's full-suite run. Verified that
|
||||
status by stashing the migration and re-running T17 — same
|
||||
60s timeout. T17 exercises the env-pill → Local → Select-folder
|
||||
→ Open-folder chain via `openEnvPill` / `selectLocal` /
|
||||
`openFolderPicker`, which use `openPill` and `clickMenuItem`
|
||||
internally. Those weren't migrated this session (per above), so
|
||||
T17's flake mode is unchanged and is pre-existing rather than
|
||||
a session-14 regression.
|
||||
- **No primitive change.** `lib/ax.ts`'s `waitForAxNode` /
|
||||
`waitForAxNodes` cover both migration sites unchanged. No new
|
||||
`WaitForAxNodeOptions` flags needed.
|
||||
|
||||
Tier 2 → Tier 2 candidates remaining for next session: same as
|
||||
session 12 / 13 — operon-mode navigation probe (still needs
|
||||
debugger), schema-rev for `listRemotePluginsPage` / `listSkillFiles`
|
||||
(still needs debugger), Tier 3 read-only reframes (login-required).
|
||||
The new shape unlocked this session: **further call-site migrations**
|
||||
in `lib/claudeai.ts` — `openPill`'s post-click while-loop and
|
||||
`clickMenuItem`'s while-loop are tractable when a follow-up signal
|
||||
warrants. Plus migrating T26's pre-click `retryUntil` (carries a
|
||||
`context-was-destroyed` retry — `waitForAxNode` doesn't currently
|
||||
swallow that exception class, so it'd need a primitive extension or
|
||||
a wrapper). Coverage at 74/76 (97%) with the test budget naturally
|
||||
shifting toward flake reduction now that the substrate exists.
|
||||
|
||||
---
|
||||
|
||||
**Shipped session 13 (1 new primitive, no new spec):** `lib/ax.ts` —
|
||||
shared AX-tree loading + traversal substrate, threshold-driven
|
||||
extraction. The plan-doc had flagged "Unified DOM/AX loading +
|
||||
|
||||
Reference in New Issue
Block a user