mirror of
https://github.com/aaddrick/claude-desktop-debian.git
synced 2026-05-17 00:26:21 +03:00
docs(testing): session 13 plan/inventory + rotate session 14 prompt
- runner-implementation-plan.md: session 13 status section (lib/ax.ts primitive shipped, no new spec, coverage stays at 74/76 = 97% since primitive-only sessions don't move the spec count; Phase 0 found debugger detached on dev box which blocked Categories A/B/C; pivoted to the PRIORITY DOM unification primitive). Updated the "Primitive gaps to flag" entry — DOM/AX loading + traversal primitive moved from FLAGGED to LANDED with the consumer list and the deliberately-deferred shapes (waitForRenderedSurface registry, CSS-querySelector primitive). - README.md: lib/ax.ts entry in the substrate-primitives note; session 13 consumer list (claudeai.ts page-objects + T26). Spec count unchanged at 74. - runner-implementation-followup-prompt.md: rotated for session 14. Adds new Category D (call-site migration to waitForAxNode for flake reduction) as the PRIORITY shape — doesn't need the debugger, builds on session 13's primitive. Carries forward Categories A / B / C (still need debugger). Phase 0 must check port 9229 BEFORE picking a category. Reading order updated: session 13 first. Co-Authored-By: Claude <claude@anthropic.com>
This commit is contained in:
@@ -1,178 +1,215 @@
|
||||
# test-harness runner implementation — session 13 prompt
|
||||
# test-harness runner implementation — session 14 prompt
|
||||
|
||||
This file is meant to be **copied verbatim into a fresh Claude Code
|
||||
session** as the initial user message. Don't paraphrase it; the
|
||||
orchestration depends on the exact directives below.
|
||||
|
||||
You're picking up after a runner-implementation session that landed 1
|
||||
new spec (T11_runtime) by way of registering five install-flow
|
||||
suffixes plus invoking BOTH case-doc-anchored read-side getters across
|
||||
TWO distinct impl objects (CustomPlugins + LocalPlugins). First cross-
|
||||
impl-object dual invocation. No primitive change. Coverage 73/76 (96%)
|
||||
→ 74/76 (97%). Two commits on `docs/compat-matrix` expected (SHAs
|
||||
new primitive (`lib/ax.ts`) and NO new spec. Session 13 was a pivot:
|
||||
Phase 0 calibration found the debugger detached on the dev box (port
|
||||
9229 not listening — Claude was running but Developer → Enable Main
|
||||
Process Debugger had not been clicked), which blocked Categories A
|
||||
(operon-mode navigation probe) and C (schema-rev for
|
||||
`listRemotePluginsPage` / `listSkillFiles`) — both need runtime
|
||||
probing against a debugger-attached running Claude. Category B (Tier
|
||||
3 read-only reframes) ALSO effectively needed the debugger for the
|
||||
smoke-test investigation phase. Session 13 pivoted to the
|
||||
PRIORITY-flagged DOM unification primitive, which was tractable
|
||||
without the debugger because both consumer signals existed
|
||||
statically: `claudeai.ts` had a private `snapshotAx`, T26 had a
|
||||
duplicate inline copy explicitly noted as "premature abstraction at 1
|
||||
consumer", plus the user reported recurring AX-query flake. Coverage
|
||||
unchanged at 74/76 (97%) — primitive-only sessions don't move the
|
||||
spec count. Two commits on `docs/compat-matrix` expected (SHAs
|
||||
inserted after the test-harness commit lands — the user reviews and
|
||||
commits at the end of every session):
|
||||
|
||||
- TBD — `test(harness): session 12 T11 plugin install runtime`
|
||||
(Tier 2 reframe; multi-suffix `waitForEipcChannels` over the
|
||||
install-flow suffixes — `CustomPlugins/installPlugin` (case-doc
|
||||
:507181) / `uninstallPlugin` / `updatePlugin` /
|
||||
`listInstalledPlugins` / `LocalPlugins/getPlugins` — plus dual
|
||||
`invokeEipcChannel` across TWO impl objects:
|
||||
`CustomPlugins_$_listInstalledPlugins` with `args = [[]]` (empty
|
||||
`egressAllowedDomains`, T33c pattern) and `LocalPlugins_$_getPlugins`
|
||||
with `args = []`; passes on KDE-W in 28.8s cold).
|
||||
- TBD — `test(harness): session 13 lib/ax.ts AX substrate primitive`
|
||||
(extracts `snapshotAx` from `claudeai.ts` private + T26 inlined
|
||||
duplicate; adds `waitForAxNode` / `waitForAxNodes` predicate-based
|
||||
polling helpers; re-exports `RawElement` / `AxNode` /
|
||||
`axTreeToSnapshot` / `waitForAxTreeStable` from `explore/walker.ts`
|
||||
so consumers stay inside `lib/`; refactors `claudeai.ts` and T26
|
||||
to consume the shared substrate).
|
||||
|
||||
The plan doc at
|
||||
[`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
|
||||
captures the tier classification and execution-time reclassifications.
|
||||
Its "Status (post-execution)" section is the source of truth for
|
||||
what's done and what's deferred — read **session 12** first, then
|
||||
**session 11**, then **session 10**, then **session 9**, then **session
|
||||
8**, then **session 7**, then **session 6**, then **session 5**, then
|
||||
**session 4**, then **session 3**, then **session 2**, then **session
|
||||
1** sub-sections.
|
||||
what's done and what's deferred — read **session 13** first, then
|
||||
**session 12**, then **session 11**, then **session 10**, then
|
||||
**session 9**, then **session 8**, then **session 7**, then **session
|
||||
6**, then **session 5**, then **session 4**, then **session 3**, then
|
||||
**session 2**, then **session 1** sub-sections.
|
||||
|
||||
This session is a continuation, not a restart. Start by reading the
|
||||
plan doc's status sections.
|
||||
|
||||
### Big new findings from session 12
|
||||
### Big new findings from session 13
|
||||
|
||||
1. **`LocalPlugins` registers 15 methods, `CustomPlugins` 16.**
|
||||
Smoke-test against the user's debugger-attached running Claude
|
||||
surfaced the full method list. Cleanly invocable read-sides:
|
||||
`LocalPlugins.getPlugins()` → array (length 0 on dev box),
|
||||
`LocalPlugins.getDownloadedRemotePlugins()` → array,
|
||||
`CustomPlugins.listInstalledPlugins([[]])` → array,
|
||||
`CustomPlugins.listMarketplaces([[]])` → array (also T33c),
|
||||
`CustomPlugins.listAvailablePlugins([[]])` → array (also T33c),
|
||||
`CustomPlugins.getCachedCommands()` → array,
|
||||
`CustomPlugins.getInstallCounts()` → null,
|
||||
`CustomPlugins.getAndClearMigrationIssues()` → null,
|
||||
`CustomPlugins.listLocalOrgPlugins()` → array. Three methods need
|
||||
pluginId at position 0 but accept any string (not just real plugin
|
||||
IDs): `getPluginOAuthStatus`, `getPluginCliStatus`,
|
||||
`getPluginShimOps`. **Two methods need extra args not derivable
|
||||
from a fresh isolation:** `LocalPlugins.listSkillFiles` (positional
|
||||
`pluginId` + `skillName` — `[]` rejects, `[cwd]` rejects too,
|
||||
needs both); `CustomPlugins.listRemotePluginsPage` (positional
|
||||
`limit: number` at 0 — every smoke-tested arg shape rejected;
|
||||
schema-rev would resolve this via grep on the `Argument "limit" at
|
||||
position 0` literal).
|
||||
2. **Cross-impl-object dual invocation is the strongest Tier 2
|
||||
pattern** when the case-doc surface spans two interfaces. T11's
|
||||
install flow involves both `CustomPlugins.*` (the API/marketplace
|
||||
side that drives install) and `LocalPlugins.*` (the local-fs side
|
||||
where plugins land). T11_runtime invokes one read-side from each
|
||||
rather than picking one. Strictly stronger than single-interface
|
||||
coverage — proves the install plumbing crosses both impls intact.
|
||||
Mixed-arg-shape fine (one needs `[[]]`, another `[]`); same as
|
||||
T21's mixed-shape (one returns array, another returns boolean).
|
||||
3. **The Tier 2 reframe pool is essentially exhausted.** Every Tier 1
|
||||
fingerprint with a tractable runtime sibling has been promoted.
|
||||
The remaining deferred items are Tier 3 (login-required write-side
|
||||
flows), Tier 4 (out of scope), or schema-rev work to unblock the
|
||||
still-rejecting read-sides surfaced this session
|
||||
(`listRemotePluginsPage`, `listSkillFiles`).
|
||||
1. **Pre-existing T16 / T17 / T07 / S25 / S29-S31 flake confirmed
|
||||
on KDE-W against the unchanged baseline.** Running the full suite
|
||||
surfaced 12 failures, including T16 (CodeTab.activate: no AX-tree
|
||||
button with accessibleName="Code" found) and T17. Verified
|
||||
pre-existing by stashing the session-13 changes and re-running
|
||||
T16 — same failure. Session 13's primitive doesn't fix the existing
|
||||
flake; it lays groundwork. Future sessions can build flake-
|
||||
reduction patches against `lib/ax.ts`'s `waitForAxNode` (e.g.
|
||||
promote `activateTab`'s one-shot snapshot to a proper retry, or
|
||||
give T07's CSS-querySelector poll a more durable wait shape if
|
||||
that abstraction emerges).
|
||||
2. **`lib/ax.ts` is the new shared AX-tree substrate.** Surface:
|
||||
- `snapshotAx(inspector, opts)` — single AX read with the
|
||||
stability gate. `opts.fast` skips the gate for inside-poll
|
||||
callers (matches the existing `claudeai.ts`/T26 contract).
|
||||
- `waitForAxNode(inspector, predicate, opts)` — repeatedly
|
||||
snapshot the tree and return the first matching `RawElement`,
|
||||
null on timeout. Gates on stability once at the start
|
||||
(configurable), then iterates with `fast: true`. Built against
|
||||
the inline polling loops in `CodeTab.activate`, `openPill`,
|
||||
`clickMenuItem`, T26 pre/post-click anchor scans — but the
|
||||
existing call-sites are NOT migrated this session (their per-
|
||||
spec retry budgets are tuned and changing them speculatively
|
||||
risks flake). Future call-site migrations are tractable.
|
||||
- `waitForAxNodes(inspector, predicate, opts)` — same shape,
|
||||
returns every match. For consumers that want to enumerate.
|
||||
- Re-exports: `RawElement`, `AxNode`, `axTreeToSnapshot`,
|
||||
`waitForAxTreeStable` — so consumers stay inside `lib/`
|
||||
instead of reaching into `explore/walker.ts` directly.
|
||||
3. **The debugger-attachment precondition is binding.** Sessions 9
|
||||
through 12 did extensive runtime probing of the per-wc IPC
|
||||
registry against the user's debugger-attached Claude. Without
|
||||
that probing, Categories A / B / C in this prompt are blocked at
|
||||
the smoke-test phase. If the user hasn't clicked Developer →
|
||||
Enable Main Process Debugger before the session starts, port 9229
|
||||
is closed and the categories pivot to either documentation work
|
||||
or the call-site-migration shape that doesn't need runtime
|
||||
probing. Phase 0 must check `ss -tln | grep ':9229'` (or `curl
|
||||
--max-time 2 http://127.0.0.1:9229/json`) before fanning out.
|
||||
4. **The reframe pool remains essentially exhausted.** Same status
|
||||
as session 12 — every Tier 1 fingerprint with a tractable runtime
|
||||
sibling has been promoted. The remaining options are now: (a)
|
||||
call-site migration to `waitForAxNode` for flake reduction, (b)
|
||||
operon-mode navigation probe (still needs debugger), (c) schema-
|
||||
rev for `listRemotePluginsPage` / `listSkillFiles` (still needs
|
||||
debugger), (d) Tier 3 read-only reframes (most need user-account
|
||||
state). The natural next-session shape is (a) — flake reduction
|
||||
builds on session 13's primitive and doesn't need the debugger.
|
||||
|
||||
### Authoritative reference
|
||||
|
||||
Read these in order before fanning out:
|
||||
|
||||
- [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
|
||||
— tier classification + status section. Read **session 12**, then
|
||||
**session 11**, **session 10**, **session 9**, **session 8**,
|
||||
**session 7**, **session 6**, **session 5**, **session 4**, **session
|
||||
3**, **session 2**, then **session 1** "Status (post-execution)"
|
||||
sub-sections. The Tier-3 list (search for "## Tier 3") is the
|
||||
candidate pool for any further reframes.
|
||||
— tier classification + status section. Read **session 13**, then
|
||||
**session 12**, then **session 11**, **session 10**, **session 9**,
|
||||
**session 8**, **session 7**, **session 6**, **session 5**, **session
|
||||
4**, **session 3**, **session 2**, then **session 1** "Status (post-
|
||||
execution)" sub-sections. The Tier-3 list (search for "## Tier 3")
|
||||
is the candidate pool for any further reframes.
|
||||
- [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
|
||||
— runner conventions, the now-74-spec inventory, primitives in
|
||||
`lib/`, isolation defaults, the CDP-gate workaround, the eipc
|
||||
note (covers registry walk, renderer-wrapper invocation, the
|
||||
schema-rev pattern from session 9, the foundational-getAll
|
||||
pattern from session 10, the dual-case-doc-anchored-read-side
|
||||
pattern from session 11, and the cross-impl-object dual
|
||||
invocation pattern from session 12).
|
||||
note, and the new `lib/ax.ts` substrate (session 13 addition;
|
||||
consumer list is `claudeai.ts` page-objects + T26).
|
||||
- [`docs/testing/cases/README.md`](cases/README.md) — case-doc
|
||||
structure and the four anchor scopes.
|
||||
- [`tools/test-harness/src/lib/`](../../tools/test-harness/src/lib/)
|
||||
— the existing primitives. No session 12 additions; surface remains
|
||||
the session 8 shape (`getEipcChannels` / `findEipcChannel` /
|
||||
`findEipcChannels` / `waitForEipcChannel` / `waitForEipcChannels` /
|
||||
`invokeEipcChannel` on `lib/eipc.ts`).
|
||||
— the existing primitives. Session 13 added `lib/ax.ts`; surface
|
||||
is `snapshotAx` / `waitForAxNode` / `waitForAxNodes` plus re-
|
||||
exports of `RawElement` / `AxNode` / `axTreeToSnapshot` /
|
||||
`waitForAxTreeStable`. The session 8 eipc surface
|
||||
(`getEipcChannels` / `findEipcChannel` / `findEipcChannels` /
|
||||
`waitForEipcChannel` / `waitForEipcChannels` / `invokeEipcChannel`
|
||||
on `lib/eipc.ts`) is unchanged.
|
||||
- [`tools/test-harness/eipc-registry-probe.ts`](../../tools/test-harness/eipc-registry-probe.ts)
|
||||
— the session 7 read-only registry probe. Re-run against a
|
||||
debugger-attached Claude (`Developer → Enable Main Process
|
||||
Debugger` from the menu) to capture the current registry shape.
|
||||
Session 12 used a small one-off smoke-test in the test-harness
|
||||
dir (`localplugins-smoke.ts` — clones the InspectorClient
|
||||
connection pattern from eipc-registry-probe.ts, dumps full
|
||||
method lists for plugin-related interfaces, runs N candidate
|
||||
read-sides through M arg shapes, reports `[OK]` / `[REJ]` per
|
||||
probe; deleted after).
|
||||
Sessions 11 / 12 used small one-off smoke-tests in the test-
|
||||
harness dir that clone the InspectorClient connection pattern
|
||||
and run N candidate read-sides through M arg shapes; deleted
|
||||
after.
|
||||
- [`tools/test-harness/src/runners/`](../../tools/test-harness/src/runners/)
|
||||
— every existing spec is a template. Notable session 12 templates:
|
||||
- `T11_runtime.spec.ts` — multi-suffix `waitForEipcChannels` over
|
||||
install-flow suffixes + dual `invokeEipcChannel` across TWO impl
|
||||
objects (CustomPlugins + LocalPlugins). Pattern for any case-doc
|
||||
test whose surface spans two interfaces — invoke a read-side from
|
||||
each rather than picking one.
|
||||
— every existing spec is a template. Notable session 13
|
||||
candidates for follow-up:
|
||||
- `T26_routines_page_renders.spec.ts` — first consumer of
|
||||
`lib/ax.ts`'s exported `snapshotAx` (refactored from inline).
|
||||
Other AX-using specs (T16, T17, H05) still call through
|
||||
`claudeai.ts` page-objects which use the shared substrate
|
||||
transparently.
|
||||
- [`docs/testing/cases/*.md`](cases/) — the spec each runner
|
||||
asserts. The **Code anchors:** field tells you exactly where
|
||||
upstream implements the feature.
|
||||
|
||||
### Tests in scope this session
|
||||
|
||||
**Realistic ceiling: ~1 new spec OR one investigation + maybe a
|
||||
narrowly-scoped Tier 2 / schema-rev landing.** Sessions 9-12 each
|
||||
landed 1-2 specs. With coverage at 74/76, the test budget naturally
|
||||
shifts toward investigation, schema-rev for still-rejecting read-
|
||||
sides, or operon-mode probing. Session 13's main bet should aim for
|
||||
1 spec OR one substantive investigation deliverable.
|
||||
**Realistic ceiling: ~1 new spec OR one substantive flake-reduction
|
||||
deliverable OR one investigation.** Sessions 9-12 each landed 1-2
|
||||
specs; session 13 landed only a primitive (debugger blocked).
|
||||
Coverage at 74/76 means the test budget naturally shifts toward
|
||||
either (a) flake reduction against `lib/ax.ts`'s primitive, (b)
|
||||
investigation that requires the debugger and was deferred from
|
||||
sessions 12-13, or (c) Tier 3 read-only reframes that the harness
|
||||
can construct from existing `seedFromHost` state.
|
||||
|
||||
#### **PRIORITY: Unify DOM loading + traversal primitives.** Take
|
||||
this on first if budget allows — the user is reporting a real,
|
||||
recurring flake: tests fail because they aren't waiting long enough
|
||||
for the DOM to render, AX-tree queries fire before the relevant
|
||||
subtree is mounted, and each spec picks its own `retryUntil` budget.
|
||||
Existing wait primitives are scattered: `electron.ts:waitForReady('userLoaded')`
|
||||
(post-login URL transition), `claudeai.ts` page-objects (each rolls
|
||||
its own `retryUntil` for AX lookups), `eipc.ts:waitForEipcChannel`
|
||||
(handler registration). No unified "wait for surface rendered"
|
||||
primitive exists. Proposed shape is **`lib/dom-ready.ts`** with
|
||||
`waitForAxNode` / `waitForAxTreeStable` / `waitForRenderedSurface`
|
||||
helpers — see plan-doc "Primitive gaps to flag" → "Unified DOM/AX
|
||||
loading + traversal primitive" for the full proposal. Pre-work:
|
||||
audit per-spec `retryUntil` budgets and AX-query sites in
|
||||
`claudeai.ts` + flaky test runners to identify the 3-5 most-flaky
|
||||
callsites; build the primitive against those specifically (not
|
||||
speculatively). Threshold-driven extraction, same way `eipc.ts` /
|
||||
`input.ts` / `electron-mocks.ts` came out of consumer pressure
|
||||
rather than design-up-front. **If this primitive is what session
|
||||
13 ships, that's a strictly higher-impact outcome than another
|
||||
Tier 2 / Tier 3 reframe — flake reduction touches every existing
|
||||
AX-using spec (T07, T16, T17, T26, H05) and unblocks future
|
||||
Code-tab AX work.**
|
||||
**Phase 0 MUST check the debugger BEFORE picking a category.** Run
|
||||
`ss -tln 2>/dev/null | grep ':9229'` (or
|
||||
`curl --max-time 2 http://127.0.0.1:9229/json`). If port 9229 is not
|
||||
listening, Categories A and C are hard-blocked. Pivot to D or B.
|
||||
|
||||
**Category A (operon-mode navigation probe)** is the natural next
|
||||
step. The other 21 wrapper-exposed operon interfaces remain registry-
|
||||
unconfirmed; if any URL form recovered from the bundle surfaces
|
||||
additional handlers, that's a tractable Tier 2 reframe. **Category B
|
||||
(Tier 3 read-only reframes)** picks the lowest-hanging Tier 3 spec
|
||||
where a non-destructive read-side might be invocable from a fresh
|
||||
isolation. **Category C (schema-rev for the rejecting read-sides)**
|
||||
unblocks `listRemotePluginsPage` or `listSkillFiles` via grep on
|
||||
the rejection literal — small-scope, useful as a fallback.
|
||||
#### **PRIORITY: Call-site migration to `lib/ax.ts`'s
|
||||
`waitForAxNode` for flake reduction.** Session 13 landed the
|
||||
substrate; this session can promote the inline retry loops in
|
||||
`claudeai.ts` (`activateTab` is the strongest candidate — it does a
|
||||
one-shot snapshot with no retry, which is exactly the failure mode
|
||||
T16 hits). Smaller-scope candidates: `findCompactPills` (one-shot
|
||||
snapshot, no retry — same shape as `activateTab`), `openPill`'s
|
||||
post-click while-loop, `clickMenuItem`'s while-loop. Each migration
|
||||
is a localized refactor; verify by running the affected specs
|
||||
(T16/T17/T26/H05) and checking pass rate. Don't speculatively
|
||||
change the budget defaults — match the existing per-spec retry
|
||||
budgets so the migration is shape-only. **If this is what session
|
||||
14 ships, that's a strictly higher-impact outcome than another Tier
|
||||
2 / Tier 3 reframe — flake reduction touches every existing AX-
|
||||
using spec.** Doesn't need the debugger.
|
||||
|
||||
Three categories — pick ONE as the main bet, treat the others as
|
||||
fallback if the main bet hits an early blocker:
|
||||
|
||||
| # | Tests | Source | Notes |
|
||||
|---|---|---|---|
|
||||
| **A** operon-mode navigation probe | n/a (investigation) + maybe small Tier 2 reframe | new probe + bundle grep for operon URL routes | Session 10 confirmed `OperonBootstrap.ensure` registers eagerly but the other 21 wrapper-exposed operon interfaces remain registry-unconfirmed. Outputs: either an operon-mode URL form recovered from the bundle (search for `operon`-keyed routes in `claude.ai/...` paths) plus a registry re-probe after navigation, OR a deferral note explaining why operon scope can't be reached without an operon-mode entry. |
|
||||
| **B** Tier 3 read-only reframes | Pick from the Tier 3 list | T33c / T35b / T37b template + bundle grep | The Tier 3 list is full of login-required flows; some have read-only entry points that the harness CAN construct. Candidates: T22's `getPrChecks` read-side might accept a non-existent PR number / dry-run mode; T15's OAuth surface has read-only state queries. Most need the user-account-scoped state to fail-fast with a clean error rather than a real network roundtrip — investigate first. |
|
||||
| **C** Schema-rev for `listRemotePluginsPage` / `listSkillFiles` | Bundle grep | session 9 schema-rev pattern | Both methods rejected every smoke-tested arg shape during session 12's investigation. `listRemotePluginsPage` needs `limit: number` at position 0 (rejection: `Argument "limit" at position 0 ...`); `listSkillFiles` needs both `pluginId` and `skillName` (rejection: `Argument "skillName" at position 1 ...`). Bundle-grep on the rejection literals → resolve the schema → ship a narrowly-scoped Tier 2 invocation if it unblocks a case-doc claim. Smaller scope than A or B; useful as a fallback. |
|
||||
| **D** call-site migration to `waitForAxNode` | `claudeai.ts` page-objects + T26 + future Code-tab AX work | `lib/ax.ts` (session 13 primitive) | The PRIORITY shape this session. Promote `activateTab`'s one-shot snapshot to use `waitForAxNode`; same for `findCompactPills`. Validate by re-running T16 / T17 / T26 / H05 against the migrated form. Doesn't need the debugger. Risk: changing the retry shape can introduce new flake if the budget defaults don't match the existing per-spec tuning — keep migrations shape-only, no budget changes. |
|
||||
| **A** operon-mode navigation probe | n/a (investigation) + maybe small Tier 2 reframe | new probe + bundle grep for operon URL routes | Session 10 confirmed `OperonBootstrap.ensure` registers eagerly but the other 21 wrapper-exposed operon interfaces remain registry-unconfirmed. Outputs: either an operon-mode URL form recovered from the bundle (search for `operon`-keyed routes in `claude.ai/...` paths) plus a registry re-probe after navigation, OR a deferral note explaining why operon scope can't be reached without an operon-mode entry. **Needs debugger-attached Claude on port 9229.** |
|
||||
| **B** Tier 3 read-only reframes | Pick from the Tier 3 list | T33c / T35b / T37b template + bundle grep | The Tier 3 list is full of login-required flows; some have read-only entry points that the harness CAN construct. Candidates: T22's `getPrChecks` read-side might accept a non-existent PR number / dry-run mode; T15's OAuth surface has read-only state queries. Most need the user-account-scoped state to fail-fast with a clean error rather than a real network roundtrip — investigate first. **Needs debugger for smoke-test verification.** |
|
||||
| **C** Schema-rev for `listRemotePluginsPage` / `listSkillFiles` | Bundle grep | session 9 schema-rev pattern | Both methods rejected every smoke-tested arg shape during session 12's investigation. `listRemotePluginsPage` needs `limit: number` at position 0 (rejection: `Argument "limit" at position 0 ...`); `listSkillFiles` needs both `pluginId` and `skillName` (rejection: `Argument "skillName" at position 1 ...`). Bundle-grep on the rejection literals → resolve the schema → ship a narrowly-scoped Tier 2 invocation if it unblocks a case-doc claim. **Needs debugger to verify the recovered schema.** |
|
||||
|
||||
If port 9229 is closed, only D is fully tractable. A documentation-
|
||||
only session that audits the existing AX call-sites and proposes a
|
||||
migration plan (without shipping) is also acceptable — pre-work for
|
||||
a future session that DOES land the migration.
|
||||
|
||||
#### Category D — call-site migration to `waitForAxNode`
|
||||
|
||||
The plan: promote inline AX retry loops in `claudeai.ts` to use
|
||||
`waitForAxNode` from `lib/ax.ts`.
|
||||
|
||||
1. **Audit the call-sites.** `activateTab` does one-shot snapshot,
|
||||
no retry — direct candidate. `findCompactPills` same. `openPill`
|
||||
post-click while-loop and `clickMenuItem` while-loop both do
|
||||
snapshot+filter+sleep — convert to `waitForAxNode` /
|
||||
`waitForAxNodes` with the existing budget. T26's pre/post-click
|
||||
`retryUntil` blocks are also direct candidates.
|
||||
2. **Migrate one call-site at a time.** Run the affected specs after
|
||||
each migration (T16 / T17 / T26 / H05). Don't migrate all at
|
||||
once — one bad budget change can cascade across multiple specs.
|
||||
3. **Don't change the retry budgets.** The existing per-spec timeouts
|
||||
are tuned (CodeTab.activate uses 5s default but T16 passes 15s);
|
||||
match them when migrating.
|
||||
4. **Don't add new functionality.** This is a shape-only refactor.
|
||||
If a migration reveals a budget that's clearly wrong (e.g.
|
||||
`activateTab` has NO retry today, which is the T16 failure mode),
|
||||
that's a small bug-fix the migration corrects — but document it.
|
||||
|
||||
#### Category A — operon-mode navigation probe
|
||||
|
||||
@@ -188,14 +225,12 @@ The plan: find an operon-mode URL form and verify whether the other
|
||||
"window.location.href = '<URL>'")`. After each navigation, re-run
|
||||
the registry probe and check the operon scope's interface count.
|
||||
3. **If any URL surfaces additional operon handlers**, ship a small
|
||||
Tier 2 reframe spec (e.g. probe `OperonBootstrap.ensure` invocation
|
||||
shape, or assert the lazy-registration count).
|
||||
Tier 2 reframe spec.
|
||||
4. **If none of the candidate URLs surface additional handlers**,
|
||||
document as "operon scope handlers register lazily on a navigation
|
||||
we can't easily construct from the harness" and defer.
|
||||
|
||||
This is the smaller-scope category — investigation + maybe one
|
||||
spec landing.
|
||||
**Needs debugger-attached Claude on port 9229.**
|
||||
|
||||
#### Category B — Tier 3 read-only reframes
|
||||
|
||||
@@ -209,14 +244,12 @@ is invocable from a fresh `seedFromHost` isolation.
|
||||
scope. The exceptions are read-side anchors that just need
|
||||
user-account-scoped data to assert against.
|
||||
2. **Smoke-test the candidate read-side** with various arg shapes.
|
||||
For example, T22's `LocalSessions.getPrChecks(prUrl)` might accept
|
||||
a fake URL string and return an empty/error array shape that
|
||||
asserts the impl is wired without making a real GitHub call —
|
||||
investigate.
|
||||
3. **Ship a Tier 2 reframe** if the read-side resolves cleanly.
|
||||
4. **Defer** if every candidate requires real account state to assert
|
||||
meaningfully.
|
||||
|
||||
**Needs debugger for smoke-test verification.**
|
||||
|
||||
#### Category C — Schema-rev for rejecting read-sides
|
||||
|
||||
The plan: resolve the validator schema for `listRemotePluginsPage` /
|
||||
@@ -228,15 +261,10 @@ a case-doc claim.
|
||||
9 finding). Read ~2KB around the hit to surface the full schema.
|
||||
2. **Smoke-test the recovered schema** against the user's debugger-
|
||||
attached running Claude.
|
||||
3. **Connect the resolved invocation to a case-doc claim.** If
|
||||
neither method connects to an existing case-doc test, the schema
|
||||
knowledge is a finding for the plan-doc but not a spec to ship.
|
||||
3. **Connect the resolved invocation to a case-doc claim.**
|
||||
4. **Ship a Tier 2 invocation** if a case-doc claim is unblocked.
|
||||
`listRemotePluginsPage` could potentially extend T33's plugin
|
||||
browser coverage with a paginated listing assertion.
|
||||
|
||||
This is the smallest-scope category — best fallback if A and B are
|
||||
blocked.
|
||||
**Needs debugger to verify the recovered schema.**
|
||||
|
||||
#### Cross-compositor focus-shifter expansion (NOT recommended this session)
|
||||
|
||||
@@ -247,27 +275,32 @@ consumer.
|
||||
|
||||
#### Main-side `invokeEipcChannel` fallback (NOT recommended this session)
|
||||
|
||||
If a future spec needs to invoke a `claude.settings/*` handler that
|
||||
only registers on the find_in_page or main_window webContents (where
|
||||
the renderer is at `file://` and the wrapper isn't exposed), the
|
||||
main-side direct-call path is documented in session 8's Status
|
||||
section. Don't add it speculatively — wait for a real consumer.
|
||||
Same status as sessions 8-13 — wait for a real consumer.
|
||||
|
||||
#### Launch event-subscription primitive (NOT recommended this session)
|
||||
|
||||
Session 11 noted that `window['claude.web'].Launch` exposes 5 `on*`
|
||||
event subscribers + `activeServersStore` not visible in
|
||||
`_invokeHandlers`. No consumer asks for an event-probe primitive
|
||||
yet — wait for one.
|
||||
Same status as sessions 11-13 — wait for a real consumer.
|
||||
|
||||
#### `waitForRenderedSurface` registry (NOT recommended this session)
|
||||
|
||||
Session 13's `lib/ax.ts` deliberately did NOT ship a named-surface
|
||||
registry; promote when a third consumer crystallizes with a specific
|
||||
surface name in mind.
|
||||
|
||||
#### CSS-querySelector primitive (NOT recommended this session)
|
||||
|
||||
Session 13's `lib/ax.ts` covers AX-tree consumers only. T07's CSS-
|
||||
querySelector poll for the topbar is a different abstraction (DOM,
|
||||
not AX). Wait for a second consumer before extracting.
|
||||
|
||||
### Constraints to respect (don't violate)
|
||||
|
||||
These are unchanged from sessions 1-12 and still load-bearing:
|
||||
These are unchanged from sessions 1-13 and still load-bearing:
|
||||
|
||||
- **Default isolation** unless the spec needs otherwise. Use
|
||||
`seedFromHost: true` for any test that depends on authenticated
|
||||
renderer state — never assume default isolation gets past
|
||||
`/login`. T11_runtime/T16/T19/T20/T21/T26/T22b/T27/T31b/T33b/T33c/T35b/T37b/T38b
|
||||
`/login`. T07/T11_runtime/T16/T17/T19/T20/T21/T26/T22b/T27/T31b/T33b/T33c/T35b/T37b/T38b
|
||||
are the templates.
|
||||
- **eipc handlers register on `webContents.ipc._invokeHandlers`,
|
||||
NOT global `ipcMain._invokeHandlers`.** Session 7 finding. Use
|
||||
@@ -278,57 +311,28 @@ These are unchanged from sessions 1-12 and still load-bearing:
|
||||
- **eipc invocation goes through the renderer-side wrapper at
|
||||
`window['claude.<scope>'].<Iface>.<method>`.** Session 8 finding.
|
||||
Use `lib/eipc.ts`'s `invokeEipcChannel` rather than rolling
|
||||
main-side direct calls — the wrapper honors the per-handler origin
|
||||
gate honestly. Main-side direct calls work but require spoofing
|
||||
`senderFrame.url`; reserved as a fallback for non-claude.ai
|
||||
webContents (no current consumer).
|
||||
main-side direct calls.
|
||||
- **For arg validator schema-rev: try smoke-test first, then grep
|
||||
the rejection message literal.** Session 9 finding. When
|
||||
`invokeEipcChannel` rejects with `Argument "<name>" at position N
|
||||
... failed to pass validation`, that exact string lives inline in
|
||||
the validator block. One grep on the literal resolves the
|
||||
location; reading ~2KB around it surfaces the full schema. Cheaper
|
||||
than runtime closure inspection in most cases. Session 11 finding:
|
||||
for trivial `typeof === 'string'` validators, the smoke-test
|
||||
resolves the shape in one round-trip — bundle-grep is unnecessary
|
||||
overhead for simple validators. Session 12: most plugin-side
|
||||
validators were resolvable by smoke-test alone (15-method
|
||||
enumeration with 3-5 arg shapes per method costs ~5 minutes).
|
||||
the rejection message literal.** Session 9 finding. Trivial
|
||||
validators (`typeof === 'string'` / similar) resolve in one
|
||||
round-trip. Elaborate validators get the bundle-grep treatment.
|
||||
- **For session-scoped Tier 2 reframes: `LocalSessions/getAll` is
|
||||
the foundational read-side surrogate.** Session 10 finding. When
|
||||
a case-doc test's anchors are write-side LocalSessions handlers
|
||||
with no read-side equivalent, ship a registration probe over the
|
||||
case-doc-anchored suffixes PLUS a single
|
||||
`invokeEipcChannel('LocalSessions_$_getAll', [])` array-shape
|
||||
assertion as the read-side surrogate.
|
||||
the foundational read-side surrogate.** Session 10 finding.
|
||||
- **For Tier 2 reframes with case-doc-anchored read-side handlers:
|
||||
invoke the case-doc-anchored handlers directly.** Session 11
|
||||
finding. When the case-doc has read-side anchors with resolvable
|
||||
arg shapes (like T21's `getConfiguredServices(cwd)` /
|
||||
`getAutoVerify(cwd)`), prefer invoking those over a foundational
|
||||
surrogate. Mixed-shape dual invocation (one returns array, another
|
||||
returns boolean) is fine — assert each shape independently.
|
||||
finding. Mixed-shape dual invocation is fine.
|
||||
- **For Tier 2 reframes spanning two interfaces: invoke a read-side
|
||||
from each.** Session 12 finding. When the case-doc surface spans
|
||||
two impl objects (T11's CustomPlugins + LocalPlugins), invoke one
|
||||
read-side from each rather than picking one. Cross-impl-object
|
||||
dual invocation proves the plumbing crosses both impls intact —
|
||||
strictly stronger than single-interface coverage. Mixed-arg-shape
|
||||
fine (one needs `[[]]`, another `[]`).
|
||||
- **`lib/input.ts` is X11-only.** Strict `XDG_SESSION_TYPE ===
|
||||
'x11'` gate. Wayland consumers must skip — don't try to bolt
|
||||
Wayland into the file.
|
||||
- **`lib/input-niri.ts` is Niri-only.** Strict
|
||||
`XDG_CURRENT_DESKTOP === 'niri'` gate. Sway / Hyprland / River
|
||||
consumers must skip or live in their own per-compositor files.
|
||||
from each.** Session 12 finding (T11_runtime template).
|
||||
- **For AX-tree consumers: use `lib/ax.ts`.** Session 13 finding.
|
||||
`snapshotAx` for one-shot reads, `waitForAxNode` /
|
||||
`waitForAxNodes` for predicate-based polling. Don't reach into
|
||||
`explore/walker.ts` directly — re-exports go through `lib/ax.ts`.
|
||||
Consumers in session 13: `lib/claudeai.ts` page-objects + T26.
|
||||
- **`lib/input.ts` is X11-only.** Strict gate.
|
||||
- **`lib/input-niri.ts` is Niri-only.** Strict gate.
|
||||
- **Don't speculate on `lib/input-wayland.ts` dispatcher.**
|
||||
Per-compositor files until a second Wayland consumer (Sway /
|
||||
Hyprland / River) lands. With only S14 on Niri, a dispatcher
|
||||
is ceremony.
|
||||
- **Code-tab AX anchors stay in plan-doc until a consumer needs
|
||||
them.** Don't preemptively add `CodeTab.activateTopTab()` to
|
||||
`claudeai.ts` — session 5's anchors block out the work for
|
||||
whenever a future consumer surfaces.
|
||||
them.**
|
||||
- **CDP auth gate is alive** — runtime SIGUSR1 attach via
|
||||
`app.attachInspector()`, never Playwright's `_electron.launch()`
|
||||
or `chromium.connectOverCDP()`.
|
||||
@@ -336,61 +340,49 @@ These are unchanged from sessions 1-12 and still load-bearing:
|
||||
`webContents.getAllWebContents()` not
|
||||
`BrowserWindow.getAllWindows()`. Constructor-level wraps don't
|
||||
work; use prototype-method hooks.
|
||||
- **`skipUnlessRow()` always first.** First line of every `test()`
|
||||
body when the test is row-gated.
|
||||
- **`skipUnlessRow()` always first.**
|
||||
- **No fixed sleeps.** `retryUntil` from `lib/retry.ts`, or
|
||||
Playwright auto-wait. Fixed `sleep(N)` is a smell. (Exception:
|
||||
short sleeps inside hand-rolled retry loops that catch typed
|
||||
errors and short-circuit; see S11 / S14 for the pattern.)
|
||||
Playwright auto-wait, or `waitForAxNode` from `lib/ax.ts`.
|
||||
(Exception: short sleeps inside hand-rolled retry loops that
|
||||
catch typed errors and short-circuit; see S11 / S14.)
|
||||
- **Diagnostics on every run.** `testInfo.attach()` the artefacts.
|
||||
Single-shot JSON dumps for multi-state tests (S11, S14, S31,
|
||||
T11_runtime, T19, T20, T21, T22b, T27, T31b, T33b, T33c, T35b,
|
||||
T37b, T38b pattern) are cleaner than 5+ separate attachments.
|
||||
- **Tag with annotations.** `severity:` and `surface:` on every
|
||||
test so JUnit carries them through to matrix-regen.
|
||||
- **Tabs in TS, ~80-char wrap as the existing files do.** Match
|
||||
surrounding style.
|
||||
- **Tabs in TS, ~80-char wrap as the existing files do.**
|
||||
- **Don't break existing runners.** `npm run typecheck` must stay
|
||||
clean. H01-H05 are the canaries; `npm test` must still pass them
|
||||
after every commit.
|
||||
after every commit. Note that T16/T17/T07/S25/S29-S31/S04 etc.
|
||||
are pre-existing-flaky on KDE-W per session 13's full-suite run
|
||||
— they're NOT canaries; baseline failures don't block work.
|
||||
- **Always grep the installed asar** to verify a fingerprint
|
||||
string is present (and how often) BEFORE shipping. Build-
|
||||
reference is beautified — strings differ from the minified
|
||||
bundle.
|
||||
string is present.
|
||||
- **For mock-then-call: the helper goes in
|
||||
`lib/electron-mocks.ts`,** not `lib/claudeai.ts`.
|
||||
`lib/electron-mocks.ts`.**
|
||||
- **Marker windows / sacrificial host processes always die in
|
||||
`finally`.** S11 / S14 are the templates — `marker.kill()` runs
|
||||
before `app.close()` so the kill happens even if the spec throws.
|
||||
- **Never log handler response BODIES into JUnit.** T37b's pattern
|
||||
(response type + length only, never the body) is correct for any
|
||||
invocation that returns user-account-scoped content. Memory bodies
|
||||
may contain personal or sensitive content; MCP server tokens may
|
||||
contain credentials; scheduled-task instructions may reference
|
||||
internal projects; marketplace `pluginContext`-filtered listings
|
||||
may surface internal-org marketplace pointers. T11_runtime's
|
||||
defensive default extends the pattern: installed-plugin entries may
|
||||
include workspace paths and plugin IDs that reveal org-internal
|
||||
marketplace pointers when the user is in an org; configured dev
|
||||
service entries (T21) may include workspace paths from auto-detect.
|
||||
`finally`.**
|
||||
- **Never log handler response BODIES into JUnit.**
|
||||
|
||||
### Phases
|
||||
|
||||
#### Phase 0 — calibration
|
||||
|
||||
1. `cd tools/test-harness && npm run typecheck` — should pass.
|
||||
2. Read the plan doc's "Status (post-execution)" session 12 section,
|
||||
then read `lib/eipc.ts`'s `invokeEipcChannel` API +
|
||||
`T11_runtime.spec.ts` leading comments. Confirm you understand the
|
||||
cross-impl-object dual invocation pattern.
|
||||
3. Pick ONE Category as the main bet. Each has a different shape:
|
||||
2. **Check debugger:** `ss -tln 2>/dev/null | grep ':9229'` (or
|
||||
`curl --max-time 2 http://127.0.0.1:9229/json`). If port 9229 is
|
||||
open, A / B / C are tractable; if closed, pivot to D or
|
||||
documentation-only.
|
||||
3. Read the plan doc's "Status (post-execution)" session 13 section,
|
||||
then read `lib/ax.ts`'s API + `T26` and `claudeai.ts`'s
|
||||
integration. Confirm you understand the `snapshotAx` /
|
||||
`waitForAxNode` / `waitForAxNodes` surface.
|
||||
4. Pick ONE Category as the main bet:
|
||||
- **D** (PRIORITY when debugger is closed): pick 1-2 call-sites
|
||||
in `claudeai.ts` to migrate, list which.
|
||||
- **A**: bundle grep + per-URL navigation + registry re-probe.
|
||||
- **B**: pick a Tier 3 candidate, smoke-test the read-side, decide
|
||||
ship or defer.
|
||||
- **C**: bundle grep on rejection literals, schema-rev, smoke-test
|
||||
the resolved shape, decide ship or defer.
|
||||
List which approaches you'll try in what order, with the cap at
|
||||
2-3 distinct approaches before STOP AND REPORT.
|
||||
|
||||
If Phase 0 surfaces a problem (typecheck failing, primitives unclear,
|
||||
the chosen Category's prerequisites don't hold), stop and report.
|
||||
@@ -398,6 +390,10 @@ Don't fan out.
|
||||
|
||||
#### Phase 1 — fan-out batch
|
||||
|
||||
For Category D (call-site migration):
|
||||
- Single subagent migrates 1-2 call-sites in `claudeai.ts` to use
|
||||
`waitForAxNode`. Verify by running T16 / T17 / T26 / H05.
|
||||
|
||||
For Category A (operon investigation):
|
||||
- Single subagent does bundle-grep for operon URL routes + per-URL
|
||||
registry re-probe. Report findings; if a Tier 2 reframe is
|
||||
@@ -405,23 +401,20 @@ For Category A (operon investigation):
|
||||
|
||||
For Category B (Tier 3 read-only reframes):
|
||||
- Spawn ONE subagent for the candidate read-side investigation
|
||||
(smoke-test + bundle-grep if needed). Treat as exploratory; report
|
||||
findings before committing to a spec shape.
|
||||
- Cap re-spawns at 2-3 distinct approaches; if no read-side resolves
|
||||
cleanly, STOP AND REPORT.
|
||||
(smoke-test + bundle-grep if needed).
|
||||
|
||||
For Category C (schema-rev):
|
||||
- Single subagent does bundle-grep on the rejection literals, surfaces
|
||||
the validator schemas, smoke-tests the recovered shapes against the
|
||||
user's debugger-attached running Claude. If a recovered schema
|
||||
unblocks a case-doc claim, ship; otherwise document and defer.
|
||||
- Single subagent does bundle-grep on the rejection literals,
|
||||
surfaces the validator schemas, smoke-tests the recovered shapes
|
||||
against the user's debugger-attached running Claude.
|
||||
|
||||
Cap at ~1 spec total — same scope as session 12's T11_runtime.
|
||||
Cap at ~1 spec OR ~1 primitive migration total — same scope as
|
||||
sessions 9-13.
|
||||
|
||||
#### Per-subagent prompt shape
|
||||
|
||||
```
|
||||
You're implementing ONE [test-harness runner | primitive |
|
||||
You're implementing ONE [test-harness runner | primitive migration |
|
||||
investigation] for <TARGET>.
|
||||
|
||||
Read in order:
|
||||
@@ -429,7 +422,8 @@ Read in order:
|
||||
- tools/test-harness/README.md (conventions; status section names
|
||||
the most-recent-template that fits)
|
||||
- tools/test-harness/src/runners/<closest-template>.spec.ts
|
||||
- tools/test-harness/src/lib/ (the primitives you'll reuse)
|
||||
- tools/test-harness/src/lib/ (the primitives you'll reuse —
|
||||
including session 13's `lib/ax.ts`)
|
||||
- CLAUDE.md (project conventions)
|
||||
|
||||
Write tools/test-harness/src/runners/<TARGET>_short_name.spec.ts
|
||||
@@ -437,8 +431,8 @@ Write tools/test-harness/src/runners/<TARGET>_short_name.spec.ts
|
||||
|
||||
[per-task specifics: pattern (seedFromHost / mock-then-call /
|
||||
asar fingerprint / shared isolation / new-primitive-build /
|
||||
investigation), assertion shape, skip rules, key constraint
|
||||
warnings]
|
||||
investigation / call-site migration), assertion shape, skip rules,
|
||||
key constraint warnings]
|
||||
|
||||
Constraints:
|
||||
- Tabs, ~80-char wrap.
|
||||
@@ -446,7 +440,8 @@ Constraints:
|
||||
- testInfo.attach() the diagnostics from the spec's "Diagnostics
|
||||
on failure" block.
|
||||
- Tag with severity + surface annotations.
|
||||
- No fixed sleeps. retryUntil or Playwright auto-wait.
|
||||
- No fixed sleeps. retryUntil, Playwright auto-wait, or
|
||||
waitForAxNode.
|
||||
- npm run typecheck must stay clean after your edits.
|
||||
- Don't commit. The user reviews and commits.
|
||||
|
||||
@@ -454,17 +449,17 @@ If the target isn't reasonable to implement (anchors don't resolve
|
||||
to anything assertable, the test depends on state you can't
|
||||
construct, the existing primitives don't cover the surface), DO
|
||||
NOT write a stub. Report under Open questions and stop. Sessions
|
||||
1-12 had cumulative ~17 "stop and report" outcomes that were the
|
||||
1-13 had cumulative ~17 "stop and report" outcomes that were the
|
||||
right call.
|
||||
|
||||
Report shape (~150 words):
|
||||
## <TARGET> [runner | primitive | investigation]
|
||||
## <TARGET> [runner | primitive | investigation | migration]
|
||||
|
||||
- File written: tools/test-harness/src/runners/<filename>.spec.ts
|
||||
[or lib/<newfile>.ts]
|
||||
[or lib/<newfile>.ts or modified lib/<existing>.ts]
|
||||
- Layer: file probe | argv probe | L1 | L2 (xprop) | L2 (DBus) |
|
||||
pgrep | new-primitive | investigation
|
||||
- Assertion shape: <one sentence>
|
||||
pgrep | new-primitive | investigation | migration
|
||||
- Assertion shape (or migration shape): <one sentence>
|
||||
- Skip rules: <which rows + why>
|
||||
- Verification path: <typecheck + run result>
|
||||
- Open questions: <caveats>
|
||||
@@ -475,49 +470,49 @@ Report shape (~150 words):
|
||||
After fan-out returns:
|
||||
|
||||
1. `cd tools/test-harness && npm run typecheck` — must stay clean.
|
||||
2. Run the new runners against KDE-W (the dev box) — but flag the
|
||||
user first if any are destructive (seedFromHost kills running
|
||||
Claude). Capture pass/skip/fail per spec for the matrix.
|
||||
2. Run the new / migrated runners against KDE-W (the dev box) — but
|
||||
flag the user first if any are destructive (seedFromHost kills
|
||||
running Claude). Capture pass/skip/fail per spec for the matrix.
|
||||
3. Update [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
|
||||
"Status (post-execution)" section to reflect newly-shipped
|
||||
specs and any reclassifications discovered mid-flight.
|
||||
specs / primitive migrations and any reclassifications.
|
||||
4. Update [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
|
||||
inventory table.
|
||||
5. Write a final report listing:
|
||||
- Specs landed (pass / skip / needs-tuning per row)
|
||||
- Specs landed / migrations completed (pass / skip / needs-tuning per row)
|
||||
- Primitives landed (with API shape)
|
||||
- Specs deferred (with the per-test rationale)
|
||||
- Specs reclassified (Tier 3 → Tier 2, Tier 2 → Tier 1, etc.)
|
||||
- Updated coverage stat (was 74/76 = 97%, now N/76 = M%)
|
||||
6. Don't commit. The user reviews and commits.
|
||||
6. Commit and push to `docs/compat-matrix` (the orchestration
|
||||
directive at the top of the followup supersedes "don't commit").
|
||||
7. Rotate this prompt: rewrite
|
||||
`docs/testing/runner-implementation-followup-prompt.md` for
|
||||
the NEXT session's deferred items.
|
||||
|
||||
### Self-correction loop
|
||||
|
||||
Same as sessions 1-12:
|
||||
Same as sessions 1-13:
|
||||
|
||||
1. Subagent typecheck failure → re-spawn with explicit fix
|
||||
instruction.
|
||||
2. Subagent claims a runner exists but `git status` shows no new
|
||||
file → re-spawn with explicit "use the Write tool" instruction.
|
||||
2. Subagent claims a runner / migration exists but `git status`
|
||||
shows no new file → re-spawn with explicit "use the Write tool"
|
||||
instruction.
|
||||
3. Two subagents wrote runners that share a primitive but with
|
||||
different shapes → factor into `lib/<topic>.ts` BEFORE
|
||||
shipping.
|
||||
4. Spec passes locally but the assertion is actually trivial (e.g.
|
||||
an unauthenticated launch where the handler check vacuously
|
||||
passes because no handlers are registered) → re-examine the
|
||||
assertion shape.
|
||||
5. **Carry-over from session 5/6/7/8/9/10/11/12:** If the chosen
|
||||
different shapes → factor into `lib/<topic>.ts` BEFORE shipping.
|
||||
4. Spec passes locally but the assertion is actually trivial → re-
|
||||
examine the assertion shape.
|
||||
5. Migration breaks an existing spec → roll back the migration; the
|
||||
per-spec retry budget was load-bearing and the primitive
|
||||
defaults didn't match. Document the budget mismatch in plan-doc.
|
||||
6. **Carry-over from session 5/6/7/8/9/10/11/12/13:** If the chosen
|
||||
Category's investigation doesn't resolve / requires schema-rev
|
||||
that exceeds budget after 2-3 approaches, STOP. Don't keep
|
||||
digging — pivot to a fallback Category. Document what was tried.
|
||||
6. **Carry-over from session 10:** If a registration probe surfaces
|
||||
"registered but uninvocable" (handler is on the registry but the
|
||||
renderer-side wrapper isn't exposed for the relevant scope or the
|
||||
validator rejects every smoke-test arg shape), document and
|
||||
defer rather than building the main-side fallback speculatively.
|
||||
7. **Carry-over from session 10:** If a registration probe surfaces
|
||||
"registered but uninvocable", document and defer rather than
|
||||
building the main-side fallback speculatively.
|
||||
|
||||
Cap re-spawns at 2 per file. Past that, mark as needing human
|
||||
review and move on.
|
||||
@@ -534,76 +529,61 @@ Stop and write the final report when one of:
|
||||
tests.** Stop, propose where the new primitive should live in
|
||||
`lib/`. Future session adds the primitive first, then resumes.
|
||||
4. **Session budget hits ~1 new spec OR one new primitive
|
||||
landing.** Stop, synthesize, leave the rest for the next
|
||||
session.
|
||||
landing OR one substantive call-site migration.** Stop,
|
||||
synthesize, leave the rest for the next session.
|
||||
5. **All categories blocked after 2-3 attempts each.** Document the
|
||||
findings as plan-doc additions and stop — coverage is at 97%, a
|
||||
no-spec session that surfaces deferral notes is fine.
|
||||
|
||||
### What you should NOT do
|
||||
|
||||
- **Don't try to land Category A + B + C in one batch.** Pick
|
||||
- **Don't try to land Category D + A + B + C in one batch.** Pick
|
||||
ONE as the main bet.
|
||||
- **Don't ship stubs.** If a runner can't actually assert what the
|
||||
spec says, mark it as Tier 3 / blocked / primitive-gap and
|
||||
don't write a placeholder. The cumulative seventeen "stop and
|
||||
report" outcomes from sessions 1-12 were the right call — every
|
||||
one revealed a real constraint.
|
||||
don't write a placeholder.
|
||||
- **Don't break existing runners.** H01-H05 are the canaries.
|
||||
T16 / T17 / T07 / S25 / S29-S31 are pre-existing-flaky on KDE-W
|
||||
per session 13's full-suite run — those are NOT canaries.
|
||||
- **Don't restructure `lib/`** beyond targeted additions.
|
||||
Premature abstractions are wrong abstractions.
|
||||
`electron-mocks.ts` (session 3), `input.ts` (session 4),
|
||||
`input-niri.ts` (session 6), and `eipc.ts` registry walker
|
||||
(session 7) + invocation surface (session 8) were threshold-
|
||||
driven extractions, not speculative.
|
||||
- **Don't run destructive Tier 3 tests** that write to the user's
|
||||
real claude.ai account (T22 PR write, T27 scheduling write, T29
|
||||
worktree creation, T34 OAuth, T36 hooks-fire-on-prompt-submit).
|
||||
Only the *read-only reframes* of those are in scope.
|
||||
- **Don't introspect `ipcMain._invokeHandlers` for `claude.web`
|
||||
eipc channels.** Session 7 confirmed those use the per-wc IPC
|
||||
scope. Use `lib/eipc.ts`'s primitive (which targets the per-wc
|
||||
scope) instead.
|
||||
- **Don't call `invokeEipcChannel` for write-side handlers** —
|
||||
`start*`, `set*`, `write*`, `run*`, `openIn*`, `delete*`,
|
||||
`cancel*`, `reset*`, `installPlugin`, `uninstallPlugin`,
|
||||
`updatePlugin`, `enablePlugin`, `uploadPlugin`, `syncRemotePlugins`.
|
||||
The primitive doesn't enforce a read-only allowlist; the safety
|
||||
property is that case-doc-anchored suffixes are read-side OR
|
||||
case-doc-anchored write-side suffixes are tested via REGISTRATION
|
||||
ONLY (`waitForEipcChannels`), never invoked. T11_runtime / T19 /
|
||||
T20 / T21 ship registration probes over write-side suffixes — that's
|
||||
the safe pattern.
|
||||
eipc channels.** Use `lib/eipc.ts`.
|
||||
- **Don't call `invokeEipcChannel` for write-side handlers.**
|
||||
- **Don't bolt other compositors into `lib/input-niri.ts`.**
|
||||
Sway / Hyprland / River each get their own per-compositor file
|
||||
if a consumer surfaces.
|
||||
- **Don't bolt Wayland into `lib/input.ts`.** X11-strict gate is
|
||||
load-bearing.
|
||||
- **Don't bolt Wayland into `lib/input.ts`.**
|
||||
- **Don't speculate on a `lib/input-wayland.ts` dispatcher.**
|
||||
Per-compositor files until a second Wayland consumer lands.
|
||||
- **Don't preemptively build `CodeTab.activateTopTab()` /
|
||||
`startNewSession()`.** Session 5 captured the AX anchors but
|
||||
T36 Phase 2 (the only known consumer) was reclassified out.
|
||||
`startNewSession()`.**
|
||||
- **Don't add a main-side `invokeEipcChannel` fallback
|
||||
speculatively.** Build it only if a concrete consumer needs to
|
||||
invoke through a non-claude.ai webContents. Premature primitives
|
||||
leak design debt.
|
||||
speculatively.**
|
||||
- **Don't speculate on a Launch event-subscription primitive.**
|
||||
Session 11 noted that `window['claude.web'].Launch` exposes 5
|
||||
`on*` event subscribers + `activeServersStore` not visible in
|
||||
`_invokeHandlers`. No consumer asks for an event-probe primitive
|
||||
yet. Wait for one.
|
||||
- **Don't extract T07's CSS-querySelector poll into `lib/ax.ts`.**
|
||||
That's a different abstraction (DOM, not AX). Wait for a second
|
||||
CSS-poll consumer before extracting.
|
||||
- **Don't add a `waitForRenderedSurface(client, surfaceKey)`
|
||||
registry to `lib/ax.ts`.** Session 13 deliberately deferred
|
||||
this — wait for a third consumer with a specific named surface.
|
||||
- **Don't change the existing per-spec retry budgets when migrating
|
||||
to `waitForAxNode`.** The budgets are tuned. Migration is shape-
|
||||
only.
|
||||
- **Don't reach into `explore/walker.ts` for AX types/helpers.**
|
||||
`lib/ax.ts` re-exports `RawElement` / `AxNode` /
|
||||
`axTreeToSnapshot` / `waitForAxTreeStable` — use those.
|
||||
- **Don't implement the #569 power-inhibit patch in this
|
||||
session.** That's a separate workstream.
|
||||
- **Don't commit.** The user reviews and commits.
|
||||
|
||||
### Final report format
|
||||
|
||||
```markdown
|
||||
## Runner implementation summary (session 13)
|
||||
## Runner implementation summary (session 14)
|
||||
|
||||
- Main-bet category: A | B | C
|
||||
- Main-bet category: D | A | B | C
|
||||
- Specs landed: N
|
||||
- Migrations completed: N
|
||||
- Primitives landed: N
|
||||
- Reclassified mid-flight: N (with reasons)
|
||||
- Coverage: was 74/76 (97%), now <NEW>/76 (<PCT>%)
|
||||
@@ -614,7 +594,7 @@ Stop and write the final report when one of:
|
||||
|
||||
| Cat | Test ID | File | Assertion shape | Status |
|
||||
|---|---|---|---|---|
|
||||
| A | <test_id> | <file>.spec.ts | … | ✓ pass / skip / fail |
|
||||
| D | <call-site> | <file>.ts | … | ✓ pass / skip / fail |
|
||||
| ... |
|
||||
|
||||
## Notable findings
|
||||
@@ -624,9 +604,7 @@ Stop and write the final report when one of:
|
||||
- ...
|
||||
|
||||
## Files touched
|
||||
git status output (tools/test-harness/src/runners/*.spec.ts +
|
||||
maybe lib/* primitives if extraction was needed; possibly plan-doc /
|
||||
README updates).
|
||||
git status output.
|
||||
|
||||
## Diff summary
|
||||
git diff --stat
|
||||
@@ -639,79 +617,44 @@ git diff --stat
|
||||
- Each subagent's Write calls land directly in the working tree.
|
||||
- The grounding probe (`tools/test-harness/grounding-probe.ts`)
|
||||
can help when implementing a runner that asserts runtime API
|
||||
state — capture once with `npm run grounding-probe -- --launch
|
||||
--include-synthetic`, grep the output for the IPC channel /
|
||||
accelerator / API your runner needs to assert against.
|
||||
state.
|
||||
- The eipc-registry probe (`tools/test-harness/eipc-registry-probe.ts`)
|
||||
is the dedicated tool for inspecting per-wc IPC handler state.
|
||||
Useful when designing new probes or auditing for upstream drift.
|
||||
Connects to a debugger-attached running Claude on port 9229.
|
||||
- For seedFromHost specs, the host MUST have a signed-in Claude
|
||||
Desktop. The primitive throws with a clear message if not.
|
||||
Document the prerequisite in your runner's leading comment if
|
||||
it's the first one to add seedFromHost coverage to a new
|
||||
surface.
|
||||
- For tests that touch the AX tree, `claudeai.ts` page-objects
|
||||
are the right substrate — see `T17_folder_picker.spec.ts` for
|
||||
the end-to-end example. Don't query DOM by CSS selector unless
|
||||
`claudeai.ts` doesn't already cover the surface. Code-tab
|
||||
session-opener anchors are documented in plan-doc session 5;
|
||||
don't add them to `claudeai.ts` unless a consumer surfaces.
|
||||
- For mock-then-call: helpers live in `lib/electron-mocks.ts`
|
||||
(extracted in session 3). See T24's leading comment for the
|
||||
`Promise<boolean>` variant + T25's for the void variant.
|
||||
- For tests that touch the AX tree, **`lib/ax.ts`** is the new
|
||||
shared substrate. `claudeai.ts` page-objects are still the
|
||||
right substrate for renderer-UI domain operations (CodeTab,
|
||||
compact pills, menu items) — they consume `lib/ax.ts`
|
||||
internally. Don't query DOM by CSS selector unless `claudeai.ts`
|
||||
doesn't already cover the surface.
|
||||
- For mock-then-call: helpers live in `lib/electron-mocks.ts`.
|
||||
- For focus-shifting (X11 only): `lib/input.ts` exports
|
||||
`focusOtherWindow` + `spawnMarkerWindow`. See S11 for the
|
||||
end-to-end consumer pattern.
|
||||
- For Wayland-native focus-shifting (Niri only): `lib/input-niri.ts`
|
||||
exports the same shape with `niri msg --json` IPC + `foot`
|
||||
marker. See S14 for the end-to-end consumer pattern.
|
||||
`focusOtherWindow` + `spawnMarkerWindow`.
|
||||
- For Wayland-native focus-shifting (Niri only): `lib/input-niri.ts`.
|
||||
- For eipc registry walking: `lib/eipc.ts` exports
|
||||
`getEipcChannels` / `findEipcChannel` / `findEipcChannels` /
|
||||
`waitForEipcChannel` / `waitForEipcChannels` against
|
||||
`webContents.ipc._invokeHandlers`. See T11_runtime / T19 / T20 /
|
||||
T21 / T22b / T31b / T33b / T38b for end-to-end consumer patterns.
|
||||
- For eipc invocation: `lib/eipc.ts` exports `invokeEipcChannel`
|
||||
(renderer-side wrapper at
|
||||
`window['claude.<scope>'].<Iface>.<method>`). See T11_runtime / T19 /
|
||||
T20 / T21 / T27 / T33c / T35b / T37b for end-to-end consumer patterns.
|
||||
`waitForEipcChannel` / `waitForEipcChannels`.
|
||||
- For eipc invocation: `lib/eipc.ts` exports `invokeEipcChannel`.
|
||||
Only call read-side suffixes; the primitive doesn't enforce a
|
||||
read-only allowlist. Cross-impl-object dual invocation pattern is
|
||||
T11_runtime; single-interface dual is T21 / T33c.
|
||||
read-only allowlist.
|
||||
- **For arg validator schema-rev (sessions 9 / 11 / 12 findings):**
|
||||
when invocation rejects with `Argument "<name>" at position N ...
|
||||
failed to pass validation`, FIRST try smoke-testing common arg
|
||||
shapes against the user's debugger-attached Claude (session 11's
|
||||
`launch-cwd-smoke.ts` / session 12's `localplugins-smoke.ts`
|
||||
pattern — clone the InspectorClient connection, iterate over arg
|
||||
shape candidates, report `[OK]` / `[REJ]` per shape). For trivial
|
||||
validators (`typeof === 'string'` / similar), this resolves the
|
||||
schema in one round-trip and avoids needing bundle-grep. For more
|
||||
elaborate validators, fall back to grep on the bundled `index.js`
|
||||
for the literal rejection string; validator block sits ~50-200
|
||||
chars before the throw site. See plan-doc session 9 status section
|
||||
for the byte offsets of the two CustomPlugins validators (5013601
|
||||
/ 5018821) as worked examples.
|
||||
smoke-test first, bundle-grep on rejection literal as fallback.
|
||||
- **For session-scoped Tier 2 reframes (session 10 finding):**
|
||||
`LocalSessions/getAll` is the foundational read-side surrogate
|
||||
when case-doc anchors are write-side. Pattern: `args = []`,
|
||||
returns `Array<Session>`. T19 and T20 are the templates.
|
||||
`LocalSessions/getAll` foundational read-side surrogate.
|
||||
- **For Tier 2 reframes with case-doc-anchored read-side handlers
|
||||
(session 11 finding):** invoke the case-doc-anchored handlers
|
||||
directly rather than using a foundational surrogate. Mixed-shape
|
||||
dual invocation is fine. T21 is the template (one returns array,
|
||||
another returns boolean — assert each shape independently).
|
||||
(session 11 finding):** invoke directly. Mixed-shape OK.
|
||||
- **For Tier 2 reframes spanning two interfaces (session 12
|
||||
finding):** invoke a read-side from each impl object. T11_runtime
|
||||
is the template (CustomPlugins/listInstalledPlugins array +
|
||||
LocalPlugins/getPlugins array — proves the install plumbing
|
||||
crosses both impls intact). Mixed-arg-shape fine.
|
||||
finding):** invoke a read-side from each impl object.
|
||||
- **For AX-tree polling (session 13 finding):** `lib/ax.ts`'s
|
||||
`waitForAxNode` / `waitForAxNodes` for predicate-based polling.
|
||||
`snapshotAx` for one-shot reads. Re-exports keep
|
||||
`explore/walker.ts` types accessible without crossing the
|
||||
lib/explore boundary.
|
||||
- **For asar fingerprints: ALWAYS grep the installed asar
|
||||
first.** Build-reference is beautified; the bundle is
|
||||
minified. Case-doc text may be the user-facing form, not the
|
||||
bundle form (e.g. `~/.claude.json` vs `.claude.json`). T18
|
||||
reads `mainView.js`, not `index.js` — `lib/asar.ts`'s
|
||||
`readAsarFile(filename, asarPath)` already handles this.
|
||||
minified.
|
||||
```bash
|
||||
cd tools/test-harness && node -e "
|
||||
const {extractFile} = require('@electron/asar');
|
||||
|
||||
@@ -18,6 +18,116 @@ work begins.
|
||||
|
||||
## Status (post-execution)
|
||||
|
||||
**Shipped session 13 (1 new primitive, no new spec):** `lib/ax.ts` —
|
||||
shared AX-tree loading + traversal substrate, threshold-driven
|
||||
extraction. The plan-doc had flagged "Unified DOM/AX loading +
|
||||
traversal primitive" in session 12 as the natural priority for
|
||||
session 13 if the operon / Tier 3 / schema-rev categories were
|
||||
blocked. Phase 0 of session 13 found the debugger detached on the
|
||||
dev box (port 9229 not listening), which blocked Categories A and C
|
||||
(operon-mode navigation probe + schema-rev for `listRemotePluginsPage`
|
||||
/ `listSkillFiles` — both need runtime probing against the user's
|
||||
debugger-attached running Claude). Category B (Tier 3 read-only
|
||||
reframes) ALSO effectively required the debugger for the smoke-test
|
||||
investigation phase. The PRIORITY (DOM unification) primitive
|
||||
landed as the strongly-supported alternative — two threshold-
|
||||
driven extraction signals (T26 had duplicated `snapshotAx` from
|
||||
claudeai.ts, plus user-reported flake in AX-tree queries).
|
||||
|
||||
Coverage stays at 74/76 (97%) — primitive-only session, no spec
|
||||
landed. The matrix coverage doesn't reflect primitive landings;
|
||||
those show up in the `lib/` surface and are picked up by future
|
||||
spec consumers.
|
||||
|
||||
Two commits on `docs/compat-matrix` expected (SHAs inserted after
|
||||
the test-harness commit lands — the user reviews and commits at the
|
||||
end of every session):
|
||||
|
||||
- TBD — `test(harness): session 13 lib/ax.ts AX substrate primitive`
|
||||
(extracts `snapshotAx` + adds `waitForAxNode` / `waitForAxNodes`;
|
||||
refactors `claudeai.ts` and `T26_routines_page_renders.spec.ts` to
|
||||
consume the shared substrate instead of carrying duplicate
|
||||
implementations; passes typecheck + H01-H03 canaries + T26 +
|
||||
T11_runtime spot-checks on KDE-W).
|
||||
|
||||
Session 13 findings + reclassifications:
|
||||
|
||||
- **`lib/ax.ts` primitive surface.** Threshold-driven extraction
|
||||
hitting 2 consumers (the formerly-private `snapshotAx` in
|
||||
`claudeai.ts` + the explicit duplicate in T26 noted as
|
||||
"premature abstraction at 1 consumer"). Surface:
|
||||
- `snapshotAx(inspector, opts)` — single AX read with a stability
|
||||
gate. `opts.fast` skips the gate for inside-poll callers
|
||||
(matches the existing internal contract).
|
||||
- `waitForAxNode(inspector, predicate, opts)` — repeatedly
|
||||
snapshot the tree and return the first matching `RawElement`,
|
||||
or null on timeout. Gates on stability once at the start
|
||||
(configurable), then iterates with `fast: true`. Built against
|
||||
the inline polling loops in `CodeTab.activate`, `openPill`,
|
||||
`clickMenuItem`, T26 pre/post-click anchor scans.
|
||||
- `waitForAxNodes(inspector, predicate, opts)` — same shape,
|
||||
returns every match. For consumers that want to enumerate.
|
||||
- Re-exports: `RawElement`, `AxNode`, `axTreeToSnapshot`,
|
||||
`waitForAxTreeStable` — so consumers don't have to reach into
|
||||
`explore/walker.ts` themselves. Walker stays the source of
|
||||
truth for AX-snapshot construction; this file is the runner-
|
||||
facing alias.
|
||||
- **Refactor scope was minimal.** `claudeai.ts` swaps its private
|
||||
`snapshotAx` for the shared one (5-line import change). T26
|
||||
drops its inlined helper and imports from `lib/ax.ts`. No
|
||||
call-site rewrites — the predicate-based polling in
|
||||
`CodeTab.activate` / `openPill` / `clickMenuItem` is unchanged
|
||||
this session. Future sessions can opportunistically migrate
|
||||
hand-rolled retry loops to `waitForAxNode` when re-touching
|
||||
those code paths; not forced this session because the call-site
|
||||
retry patterns each carry per-spec budget tuning that the
|
||||
primitive's defaults need to validate against real flake data.
|
||||
- **Why no spec landed.** Phase 0 calibration found port 9229
|
||||
detached (Claude was running but debugger wasn't attached via
|
||||
Developer → Enable Main Process Debugger). Categories A and C
|
||||
strictly need runtime probing against the debugger; Category B
|
||||
needs the debugger for the smoke-test verification phase (per
|
||||
session-12 pattern). The PRIORITY primitive build was the
|
||||
highest-impact deliverable that didn't require the debugger —
|
||||
pure static-analysis-driven extraction with two existing
|
||||
consumers as the threshold signal. Primitive-only sessions are
|
||||
in scope per the followup prompt's termination criteria
|
||||
("Session budget hits ~1 new spec OR one new primitive
|
||||
landing").
|
||||
- **What's NOT in `lib/ax.ts`.** Did NOT add a
|
||||
`waitForRenderedSurface(client, surfaceKey)` registry — the
|
||||
plan-doc flag mentioned it but no consumer asks for a named
|
||||
surface anchor today; promote when a third consumer crystallizes
|
||||
with a specific surface name in mind. Did NOT extract T07's
|
||||
CSS-querySelector poll loop — that's a different abstraction
|
||||
(DOM, not AX) with no second consumer signal yet. Did NOT
|
||||
rewrite call-site retry budgets in `claudeai.ts` — the budgets
|
||||
are tuned per-spec and changing them speculatively risks
|
||||
introducing flake rather than removing it.
|
||||
- **Pre-existing T16 / T17 flake confirmed unchanged.** Running
|
||||
the full suite found T16 / T17 / T07 / S25 / S29-S31 / etc.
|
||||
failing on KDE-W — these failures are pre-existing on the
|
||||
baseline (verified by stashing the session-13 changes and re-
|
||||
running T16, which still failed with the same
|
||||
`CodeTab.activate: no AX-tree button with accessibleName="Code"
|
||||
found` error). Session 13's primitive doesn't fix the existing
|
||||
flake; it lays groundwork that future sessions can build
|
||||
flake-reduction patches against (e.g. promoting `activateTab`
|
||||
to use `waitForAxNode` with a longer budget instead of a one-
|
||||
shot snapshot would be the next session's natural follow-up).
|
||||
|
||||
Tier 2 → Tier 2 candidates remaining for next session: the same
|
||||
list as session 12 — operon-mode navigation probe (still needs a
|
||||
debugger-attached Claude), schema-rev for `listRemotePluginsPage`
|
||||
/ `listSkillFiles` (same), Tier 3 read-only reframes (same). The
|
||||
new option for next session is **call-site migration to
|
||||
`waitForAxNode`** — promote `activateTab`'s one-shot snapshot to a
|
||||
proper retry, give T07's CSS poll a more durable wait shape, etc.
|
||||
That's a flake-reduction session shape rather than a coverage-
|
||||
expansion shape; the session 13 primitive made it tractable.
|
||||
|
||||
---
|
||||
|
||||
**Shipped session 12 (1 new spec, no primitive change):** T11_runtime
|
||||
(Tier 2 reframe — `seedFromHost` + multi-suffix registration probe
|
||||
over five install-flow handlers + dual-handler invocation across two
|
||||
@@ -1642,35 +1752,22 @@ a primitive that needs a small extension:
|
||||
dependent), but if it ever becomes tractable, a
|
||||
`lib/displays.ts` mocking `screen.getAllDisplays()` would be
|
||||
the entry.
|
||||
- **Unified DOM/AX loading + traversal primitive (FLAGGED session
|
||||
12).** Existing wait/traversal primitives are scattered:
|
||||
`electron.ts:waitForReady('userLoaded')` covers the post-login
|
||||
webContents URL transition; `claudeai.ts` page-objects roll their
|
||||
own `retryUntil` for AX-tree node lookups; `eipc.ts:waitForEipcChannel`
|
||||
covers handler registration. The user reports lots of failures
|
||||
because tests aren't waiting long enough for the DOM to render —
|
||||
AX-tree queries fire before the relevant subtree is mounted, and
|
||||
individual specs each pick their own `retryUntil` budget. Symptoms:
|
||||
flaky AX-anchor lookups under cold-cache or slow-machine conditions;
|
||||
premature `waitForReady('userLoaded')` resolution before claude.ai's
|
||||
client-side router has hydrated the surface the test wants to query.
|
||||
Proposed shape: **`lib/dom-ready.ts`** exporting one or more
|
||||
composable wait helpers — e.g. `waitForAxNode(client, selector,
|
||||
opts)` (retryUntil over the AX walker with a sensible default
|
||||
budget, ~15-30s, plus a per-call override), `waitForAxTreeStable(client,
|
||||
opts)` (no node count change for N consecutive ticks — proxy for
|
||||
"render finished"), and `waitForRenderedSurface(client, surfaceKey)`
|
||||
(case-doc-anchored surface markers — a small registry of known
|
||||
anchors per surface so consumers don't roll their own AX selectors).
|
||||
Should also unify the existing `claudeai.ts` activation methods
|
||||
around the new helpers rather than each rolling its own retryUntil.
|
||||
Touches enough specs that a session 13 primitive build would
|
||||
reduce flake across T16/T17/T26/T07/H05 plus any future Code-tab
|
||||
AX work — flag as the main bet for session 13 if the operon /
|
||||
Tier 3 / schema-rev categories are blocked. Pre-work: audit
|
||||
per-spec `retryUntil` budgets and AX-query sites to identify the
|
||||
3-5 most-flaky callsites; build the primitive against those
|
||||
specifically rather than speculatively.
|
||||
- **Unified DOM/AX loading + traversal primitive (LANDED session
|
||||
13 as `lib/ax.ts`).** Threshold-driven extraction once T26 had to
|
||||
redefine `snapshotAx` inline (after `claudeai.ts`'s private copy
|
||||
was the only consumer for sessions 1-12). The primitive surface
|
||||
exports `snapshotAx`, `waitForAxNode`, `waitForAxNodes`, plus
|
||||
re-exports of `RawElement` / `AxNode` / `axTreeToSnapshot` /
|
||||
`waitForAxTreeStable` so consumers don't reach into
|
||||
`explore/walker.ts` directly. `claudeai.ts` and T26 both consume
|
||||
the shared substrate; future call-site migrations (e.g.
|
||||
`activateTab` → `waitForAxNode`) are tractable now. The
|
||||
speculative `waitForRenderedSurface(client, surfaceKey)` shape
|
||||
was deliberately NOT shipped — no consumer asks for a named-
|
||||
surface registry today; promote when a third consumer
|
||||
crystallizes with a specific surface name. The CSS-querySelector
|
||||
poll in T07 was deliberately NOT extracted — different
|
||||
abstraction (DOM, not AX), no second consumer signal yet.
|
||||
|
||||
## Open questions for the parent agent
|
||||
|
||||
|
||||
@@ -120,11 +120,18 @@ against case-doc anchors; consumed by T19 / T20 / T22b / T31b / T33b /
|
||||
T38b) plus its session 8 invoke surface (`invokeEipcChannel` — calls
|
||||
a registered handler through the renderer-side wrapper at
|
||||
`window['claude.<scope>'].<Iface>.<method>`; consumed by T19 / T20 /
|
||||
T27 / T33c / T35b / T37b) — and the `createIsolation({ seedFromHost:
|
||||
true })` primitive that lets login-required tests run hermetically
|
||||
against a copy of the host's signed-in auth state (T07, T11_runtime,
|
||||
T16, T19, T20, T21, T22b, T26, T27, T31b, T33b, T33c, T35b, T37b,
|
||||
T38b).
|
||||
T27 / T33c / T35b / T37b), the `lib/ax.ts` AX-tree substrate
|
||||
(`snapshotAx` for one-shot reads + `waitForAxNode` / `waitForAxNodes`
|
||||
for predicate-based polling, plus re-exports of `RawElement` /
|
||||
`AxNode` / `axTreeToSnapshot` / `waitForAxTreeStable` from
|
||||
`explore/walker.ts` so consumers stay inside `lib/`; threshold-
|
||||
driven extraction in session 13 once T26 had to duplicate the
|
||||
formerly-private `snapshotAx` from `claudeai.ts`; consumed by
|
||||
`claudeai.ts` page-objects + T26) — and the
|
||||
`createIsolation({ seedFromHost: true })` primitive that lets login-
|
||||
required tests run hermetically against a copy of the host's signed-
|
||||
in auth state (T07, T11_runtime, T16, T19, T20, T21, T22b, T26, T27,
|
||||
T31b, T33b, T33c, T35b, T37b, T38b).
|
||||
|
||||
Note on eipc channels: the `LocalSessions_$_*` and `CustomPlugins_$_*`
|
||||
channel names referenced in the case-doc Code anchors don't register
|
||||
|
||||
Reference in New Issue
Block a user