docs(testing): session 16 verify T17 seedFromHost + schema-rev for listRemotePluginsPage / listSkillFiles + flag orchestrator STOP for session 17

Final session of the sessions-13-to-16 autonomous orchestration run.

Verified session 15's T17 seedFromHost migration end-to-end against
the dev box: bare 60s Playwright timeout is GONE, seedFromHost clones
host config, waitForReady('userLoaded') resolves to a post-login URL
(https://claude.ai/epitaxy), dialog mock installs, and the session-14
CodeTab.activate({ timeout: 15_000 }) AX migration succeeds first try.
T17 reaches a NEW failure mode at the next chain step
(openFolderPicker after selectLocal — Select-folder pill doesn't
render on /epitaxy workspace route, likely needs /new context).
Classified as renderer-state-dependent, not openPill / clickMenuItem
loop — ruling out sessions 14-15's parked AX migration hypothesis
once and for all. Deferred for a future session (needs careful /new
navigation primitive).

Schema-rev resolved both deferred validators by bundle inspection of
app.asar (no smoke-test possible — T17's seedFromHost step killed the
debugger-attached leaked isolations as expected):

- CustomPlugins.listRemotePluginsPage(limit: number, offset: number)
- LocalPlugins.listSkillFiles(pluginId: string, skillName: string,
  pluginContext?: opaque)

Neither shipped as a Tier 2 invocation — listRemotePluginsPage is
not anchored in any case doc (T33 anchors listMarketplaces +
listAvailablePlugins, both already covered by T33b/T33c);
listSkillFiles is meaningful only with an installed plugin, which
needs Tier 3 destructive setup explicitly forbidden by the
constraints. Schemas captured in plan-doc as a deferred reframe.

Coverage stays at 74/76 (97%) — verification + investigation, no
spec landed.

Orchestration-level summary (sessions 13-16):
- Coverage start 74/76 (97%) → end 74/76 (97%) — NO net coverage
  gain across 4 sessions
- Net deliverables: 1 primitive (lib/ax.ts, session 13), 1 AX
  migration (activateTab + CodeTab.activate, session 14, fixed T16
  pre-existing-flake), 1 structural fix (T17 seedFromHost, session
  15, verified working session 16), 1 verification + 1 schema-rev
  investigation (session 16)
- Why coverage stalled: structural ceiling reached. Remaining 2
  specs need real claude.ai account write-side state which the
  harness can't construct without violating the Tier 3 destructive
  constraint.

Followup prompt rotated for session 17 with a STOP flag at the top —
session 17 will only run if the user manually triggers another
orchestration AND at least one of four preconditions holds (real
signed-in debugger-attached Claude, real-account write-side fixture,
renderer-drift event, or new IPC surface).

Co-Authored-By: Claude <claude@anthropic.com>
This commit is contained in:
aaddrick
2026-05-04 00:30:52 -04:00
parent 14ccb61596
commit 0a1f8071e9
3 changed files with 333 additions and 611 deletions

View File

@@ -1,645 +1,215 @@
# test-harness runner implementation — session 16 prompt
# test-harness runner implementation — session 17 prompt
This file is meant to be **copied verbatim into a fresh Claude Code
session** as the initial user message. Don't paraphrase it; the
orchestration depends on the exact directives below.
You're picking up after a runner-implementation session that landed 1
structural fix (T17 migrated from legacy `CLAUDE_TEST_USE_HOST_CONFIG=1`
auth path to `seedFromHost: true`, no new spec, no AX migration).
Session 15 was an investigation session: Phase 0 calibration found
port 9229 listening BUT the attached process was a leaked test
isolation at `claude.ai/login` rather than the user's auth-bearing
Claude — every webContents URL on that process was either `find_in_page`,
`/login`, or `main_window/index.html`, and the user-data-dir was
`/tmp/claude-test-*`. That made Categories A (operon-mode probe) / B
(Tier 3 read-only reframes) / C (schema-rev) all soft-blocked: the
debugger was technically attached, but to the wrong process for any
auth-required investigation. Session 15 pivoted to investigating T17's
pre-existing flake (the PRIORITY directive) and discovered the failure
was structural rather than AX-polling-related — the spec was using the
legacy `CLAUDE_TEST_USE_HOST_CONFIG=1` / `isolation: null` shape, and
when run without that env var fell through to a fresh isolation with no
auth, where `waitForUserLoaded`'s 90s default budget gets preempted by
Playwright's 60s spec timeout. Coverage unchanged at 74/76 (97%) —
structural fixes don't move the spec count, but T17 should now succeed
when host is signed in (rather than auto-failing with a bare 60s
timeout). Two commits on `docs/compat-matrix` expected (autonomous
orchestration commits + pushes — the user reviews after the session):
> **ORCHESTRATION STOPPED AFTER SESSION 16.** This prompt is rotated
> for completeness only. **Session 17 will NOT run automatically** —
> the autonomous orchestration was halted at the end of session 16
> after coverage stalled at 74/76 (97%) for four consecutive sessions
> (13, 14, 15, 16). To resume, the user must manually trigger another
> orchestration run AND meet at least one of these preconditions:
>
> 1. **Real signed-in Claude Desktop running with `--inspect=9229`**
> on the dev box (debugger-attached, signed in, NOT a leaked test
> isolation). This unblocks Categories A (operon-mode probe) and
> B (Tier 3 read-only reframes that need auth-bearing renderer
> state).
> 2. **A real claude.ai account fixture for write-side state.** The
> remaining 2 specs (matrix coverage 74/76 → 76/76) need real
> write-side state (e.g. an installed plugin to exercise
> `LocalPlugins.listSkillFiles`, or a deep-linked deferred install
> intent for T11). The Tier 3 destructive constraint
> (`Don't run destructive Tier 3 write-side tests`) explicitly
> forbids the harness constructing this state itself.
> 3. **Renderer-drift event** that requires re-anchoring page-objects
> (e.g. claude.ai redesign breaks `findCompactPills`,
> `clickMenuItem`, etc.). Triggers a defensive-migration session.
> 4. **New IPC surface** added by upstream that the harness should
> cover (e.g. a new `claude.web` interface, a new eipc method
> that's case-doc-anchored).
>
> If none of those preconditions hold, the orchestration should NOT
> resume — further sessions will produce documentation-only or
> marginal output. The structural ceiling of the harness without
> real-account fixtures is 74/76 (97%); we're already there.
- TBD — `test(harness): session 15 migrate T17 to seedFromHost +
prune unused RawElement import (no spec, coverage unchanged at 97%)`
(T17 spec rewrite swapping the `CLAUDE_TEST_USE_HOST_CONFIG=1` +
`isolation: null` branch for the canonical `seedFromHost: true`
pattern; prunes unused `RawElement` re-export import in
`lib/claudeai.ts` per session 14's leftover hint; typecheck clean;
T17 not actually run this session — see below).
You're picking up after session 16 of the test-harness runner
implementation work. Session 16 was the final session of the
sessions-13-to-16 orchestration run and produced: T17 verification
(session-15 structural fix VERIFIED — bare 60s timeout gone, new
failure mode at `openFolderPicker` post-`selectLocal` classified as
renderer-state-dependent and deferred), schema-rev for
`listRemotePluginsPage` / `listSkillFiles` (both schemas resolved by
bundle inspection — neither shipped as a Tier 2 invocation because
`listRemotePluginsPage` is not anchored in any case doc, and
`listSkillFiles` needs Tier 3 destructive setup). NO coverage gain.
Plan-doc updated. Followup-prompt rotated with the STOP flag (this
document).
The plan doc at
[`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
captures the tier classification and execution-time reclassifications.
Its "Status (post-execution)" section is the source of truth for
what's done and what's deferred — read **session 15** first, then
**session 14**, then **session 13**, then **session 12**, then
**session 11**, then **session 10**, then **session 9**, then **session
8**, then **session 7**, then **session 6**, then **session 5**, then
**session 4**, then **session 3**, then **session 2**, then **session
1** sub-sections.
what's done and what's deferred — read **session 16** first, then
**session 15**, **session 14**, **session 13**, **session 12**,
**session 11**, **session 10**, **session 9**, **session 8**,
**session 7**, **session 6**, **session 5**, **session 4**, **session
3**, **session 2**, then **session 1** sub-sections.
This session is a continuation, not a restart. Start by reading the
plan doc's status sections.
plan doc's status sections AND verifying at least one of the
preconditions above holds. If none hold, STOP and report; don't try
to fan out.
### Big new findings from session 15
### Session 16 final findings (key context for any session-17 attempt)
1. **T17 flake was structural, not AX-polling.** The trace showed
bare 60s Playwright timeout with NO `renderer-url` attachment —
meaning the test never reached line 49's attach call, which
means it never resolved `waitForReady('userLoaded')` at line 40.
Root cause: T17 was the last spec on the legacy
`CLAUDE_TEST_USE_HOST_CONFIG=1` / `isolation: null` shape — every
other auth-required spec (T07, T16, T19, T20, T21, T22b, T26,
T27, T31b, T33b/c, T35b, T37b, T38b) had moved to `seedFromHost:
true`. Without that env var (which CI / orchestration didn't
set), T17 fell through to a fresh isolation with no auth, hit
`/login`, and `waitForUserLoaded`'s 90s budget got preempted by
the 60s spec timeout. **Session 14's hypothesis was wrong** —
the AX click chain in `openPill` / `clickMenuItem` was never
reached, so migrating those wouldn't have fixed anything.
2. **`openPill` / `clickMenuItem` migration parked.** With T17's
actual flake explained by the auth-path mismatch, there's no
remaining flake-evidence pulling for the AX migration that
sessions 14-15 considered. `openPill`'s while-loop and
`clickMenuItem`'s while-loop work fine when the auth path is
correct. Don't migrate speculatively — wait for a third
consumer to surface with budget-tuning evidence.
3. **Phase 0 must distinguish "port open" from "port attached to
user's signed-in Claude".** Session 14 saw port 9229 closed and
correctly classified as debugger-detached. Session 15 saw port
9229 OPEN but attached to a leaked test isolation at /login —
Categories A/B/C still soft-blocked. The right Phase 0 probe:
`evalInMain` listing webContents and checking that AT LEAST one
URL is `https://claude.ai/<not /login>`. If every webContents is
`/login` or `find_in_page` or `main_window`, treat it the same
as port-closed for auth-required investigations. Session 15's
one-off probe shape (kept inline in the report, deleted after):
1. **T17's session-15 structural fix VERIFIED.** Bare 60s timeout is
gone. `seedFromHost` clones the host's signed-in config,
`waitForReady('userLoaded')` resolves to a post-login URL
(`https://claude.ai/epitaxy` on the dev box), the dialog mock
installs, and `CodeTab.activate({ timeout: 15_000 })` (session 14
migration) succeeds first try.
2. **T17's NEW failure mode is renderer-state-dependent, not AX.**
After `selectLocal()` clicks the Local menuitem, the Select-folder
pill never appears within 4s. The URL during the run was
`/epitaxy` — the user's workspace route. The folder-picker UI
may only render on `/new` (or a fresh project), not on a workspace
already containing files. To unblock: navigate to `/new`
post-userLoaded BEFORE `openFolderPicker()`. NOT shipped session
16 — needs a careful navigation primitive that doesn't break
existing seedFromHost specs.
3. **`openPill` / `clickMenuItem` migration STILL parked.** Session
16's T17 trace confirmed the env-pill open + Local click both
succeeded, ruling out the AX-polling-loop hypothesis once and for
all. Don't migrate those speculatively.
4. **Schema-rev resolved both deferred validators.**
`CustomPlugins.listRemotePluginsPage(limit: number, offset:
number)`. `LocalPlugins.listSkillFiles(pluginId: string,
skillName: string, pluginContext?: opaque)`. Neither shipped as a
Tier 2 invocation: `listRemotePluginsPage` is not anchored in any
case doc; `listSkillFiles` needs Tier 3 destructive setup.
5. **Coverage stalled at 74/76 (97%) for 4 consecutive sessions.**
Sessions 13-16 net deliverables: 1 primitive, 1 AX migration, 1
structural fix, 1 verification + 1 schema-rev investigation.
Without real-account fixtures, the harness's structural ceiling
is 74/76. The remaining 2 specs need real-account write-side
state.
```ts
const wcs = await client.evalInMain(`
const { webContents } = process.mainModule.require('electron');
return webContents.getAllWebContents().map((w) => ({
id: w.id, url: w.getURL(), title: w.getTitle(),
}));
`);
```
### What a future session 17 might attempt (only if preconditions hold)
4. **Leaked `/tmp/claude-test-*` dirs accumulating on dev box.**
Multiple test isolations from prior sessions have leaked their
tmpdirs and (in some cases) their Electron child processes.
`ls /tmp/ | grep claude-test` showed several. The session 15
T17 spec wasn't run because killing those leaked Electron
processes might also kill the user's real running Claude (PID
ambiguity from `ps`). A future session can either (a) verify
no real Claude is running before invoking T17, or (b) just
accept the seedFromHost kill side effect and let the user
re-launch Claude after the session.
5. **Productivity signal is dimming.** Sessions 13-15 collectively
produced one new primitive (`lib/ax.ts`), one substantive AX
migration (`activateTab` + `CodeTab.activate`), and one
structural fix (T17 seedFromHost). NO coverage gain in those
three sessions. The remaining categories without an
auth-bearing debugger-attached Claude are mostly exhausted.
Next session should prioritise (a) running T17 to verify the
seedFromHost fix actually resolves the timeout, and (b) checking
whether a Category C schema-rev probe against the leaked /login
isolation is tractable (validators don't need auth, only
invocation does — worth a 15-min investigation). If both turn
up empty, the orchestrator should seriously consider stopping —
at 97% coverage with no clear high-leverage shapes left,
further sessions are likely to produce documentation-only or
marginal-improvement deliverables.
If precondition 1 (real signed-in debugger-attached Claude) holds:
- **Operon-mode probe** (Category A from sessions 13-16). Run
`eipc-registry-probe.ts` against the user's Claude with operon mode
toggled on/off, capture the diff in registered channels. May
surface a new case-doc-coverable handler.
- **Schema-rev smoke-test** for the session-16-resolved schemas
against the live debugger. `listRemotePluginsPage(limit: 10,
offset: 0)` should return an array shape; `listSkillFiles('some-
installed-plugin', 'some-skill')` would test the LocalPlugins
handler's auth path.
If precondition 2 (real-account write-side fixture) holds:
- **T11 runtime invocation.** With an installed plugin in
`~/.claude/plugins/`, the post-install state can be probed via
`listSkillFiles` and the slash-menu skills would assert the
case-doc claim "skills appear in the slash menu" (T11 step 3).
- **T17 navigation fix.** Add a `/new` navigation primitive to
`claudeai.ts`'s `CodeTab` so `openFolderPicker` works on a fresh
project route. Verify T17 reaches the dialog mock fired assertion.
If precondition 3 or 4 holds:
- **Defensive page-object refactor.** Re-snapshot the AX tree at the
Customize panel and Plugin browser modal, refresh case-doc
inventory anchors, migrate any decayed selectors.
### Termination signal interpretation
If session 17 is triggered without any precondition met, the right
move is the same as session 16's STOP recommendation: write a one-
paragraph "preconditions not met, no work shipped" plan-doc update
and terminate. Don't burn a session on documentation-only output.
### Constraints to respect (unchanged from sessions 1-16)
- Use `seedFromHost: true` for any auth-required spec — never
`CLAUDE_TEST_USE_HOST_CONFIG=1` / `isolation: null` (legacy shape
removed in session 15).
- eipc handlers register on `webContents.ipc._invokeHandlers`, NOT
global `ipcMain._invokeHandlers`. Use `lib/eipc.ts`.
- For arg validator schema-rev: smoke-test first, fall back to
bundle-grep on the rejection literal.
- For AX-tree consumers: use `lib/ax.ts` (`snapshotAx` /
`waitForAxNode` / `waitForAxNodes`).
- For call-site migrations to `waitForAxNode`: keep per-spec retry
budgets matching existing tuning.
- `lib/input.ts` is X11-only. `lib/input-niri.ts` is Niri-only. CDP
auth gate is alive (runtime SIGUSR1 attach, never Playwright
`_electron.launch()`). BrowserWindow Proxy gotcha — use
`webContents.getAllWebContents()`. `skipUnlessRow()` always first.
- No fixed sleeps. `retryUntil` from `lib/retry.ts`, Playwright
auto-wait, or `waitForAxNode` from `lib/ax.ts`.
- Diagnostics on every run via `testInfo.attach()`. Tag with
`severity:` and `surface:` annotations.
- Tabs in TS, ~80-char wrap.
- Don't break existing runners. H01-H05 are the canaries.
- `npm run typecheck` must stay clean.
- Don't run destructive Tier 3 write-side tests.
### Authoritative reference
Read these in order before fanning out:
- [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
— tier classification + status section. Read **session 15**, then
**session 14**, **session 13**, **session 12**, **session 11**,
**session 10**, **session 9**, **session 8**, **session 7**,
**session 6**, **session 5**, **session 4**, **session 3**,
**session 2**, then **session 1** "Status (post-execution)"
sub-sections. The Tier-3 list (search for "## Tier 3") is the
candidate pool for any further reframes.
— tier classification + status sections.
- [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
— runner conventions, the now-74-spec inventory, primitives in
`lib/`, isolation defaults (T17 now seedFromHost per session 15),
the CDP-gate workaround, the eipc note, and `lib/ax.ts` substrate.
— runner conventions, the 74-spec inventory, primitives in
`lib/`, isolation defaults.
- [`docs/testing/cases/README.md`](cases/README.md) — case-doc
structure and the four anchor scopes.
- [`tools/test-harness/src/lib/`](../../tools/test-harness/src/lib/)
— the existing primitives. `lib/ax.ts` surface is `snapshotAx` /
`waitForAxNode` / `waitForAxNodes` plus re-exports. The session 8
eipc surface (`getEipcChannels` / `findEipcChannel` /
`findEipcChannels` / `waitForEipcChannel` /
`waitForEipcChannels` / `invokeEipcChannel` on `lib/eipc.ts`) is
unchanged.
- [`tools/test-harness/eipc-registry-probe.ts`](../../tools/test-harness/eipc-registry-probe.ts)
— the session 7 read-only registry probe. Re-run against an
auth-bearing debugger-attached Claude (`Developer → Enable Main
Process Debugger` from the menu, signed-in) to capture the
current registry shape.
— the existing primitives.
- [`tools/test-harness/src/runners/`](../../tools/test-harness/src/runners/)
— every existing spec is a template. Notable session 15
candidates for follow-up:
- `T17_folder_picker.spec.ts` — newly migrated to seedFromHost.
Run to verify the 60s timeout is gone. If T17 now passes, the
structural fix shipped session 15 is verified.
- Schema-rev for `listRemotePluginsPage` / `listSkillFiles` —
rejection literals can be bundle-grepped without auth, and the
validator runs auth-independent if /login state lets us
invoke through the renderer-side wrapper. Session 12 found
`listRemotePluginsPage` needs `limit: number` at position 0
and `listSkillFiles` needs both `pluginId` and `skillName`.
- [`docs/testing/cases/*.md`](cases/) — the spec each runner
asserts. The **Code anchors:** field tells you exactly where
upstream implements the feature.
— every existing spec is a template.
### Tests in scope this session
**Realistic ceiling: ~1 verification run OR ~1 schema-rev investigation
OR a "stop the orchestration" recommendation.** Sessions 9-12 each
landed 1-2 specs; session 13 landed only a primitive (debugger
blocked); session 14 landed only a migration (debugger blocked);
session 15 landed only a structural fix (debugger soft-blocked).
Coverage at 74/76 means the test budget naturally shifts toward
verification, low-stakes investigation, or the orchestration
termination decision.
**Phase 0 MUST check the debugger-attachment quality, not just port
status.** Run `ss -tln 2>/dev/null | grep ':9229'` for port. If open,
also run an `evalInMain` probe to enumerate webContents URLs — if no
URL is `https://claude.ai/<not /login>`, treat as soft-blocked for
auth-required categories. Probe shape (kept inline; delete after):
```ts
import { InspectorClient } from './src/lib/inspector.js';
const client = await InspectorClient.connect(9229);
const wcs = await client.evalInMain<unknown>(`
const { webContents } = process.mainModule.require('electron');
return webContents.getAllWebContents().map((w) => ({
id: w.id, url: w.getURL(), title: w.getTitle(),
}));
`);
console.log(wcs); client.close();
```
If every URL is `/login` or `find_in_page` or `main_window/index.html`,
the debugger is attached to a leaked test isolation, not the user's
Claude. Categories A and most of B are blocked. Category C may still
be tractable since validators run auth-independent — try the schema-
rev probe against the /login wrapper.
#### **PRIORITY: Verify T17's session 15 seedFromHost migration
actually resolves the 60s timeout.** Session 15 didn't run T17 because
the dev box had ambiguous Electron processes (some leaked test
isolations, possibly the user's real Claude — `ps` couldn't
disambiguate cleanly). Session 16's first action:
1. Check `pgrep -af "ozone-platform=x11.*app.asar"` and
`ps -o pid,user-data-dir` to identify whether any real-Claude
process is running (real Claude has a non-`/tmp/claude-test-*`
user-data-dir, typically nothing or `~/.config/Claude`).
2. If only test cruft is running, run T17 (`npx playwright test
T17 --reporter=list`). The test will kill those leaked
processes via `seedFromHost`'s host-Claude-kill semantics —
that's actually a desirable cleanup side effect.
3. If a real Claude IS running, **flag clearly in the report
before running**, then run T17. The user accepted the
`seedFromHost` kill side effect when authorising autonomous
orchestration; just be transparent about it.
4. Capture pass/skip/fail. Update the matrix coverage doc if
T17 now passes.
5. If T17 still fails, classify the new failure mode (is it now
AX-polling? Folder picker chain? Mock not installing?) and
decide whether to fix or defer.
This is **strictly higher-impact than session 14/15's
spec-implementation work** because it produces a concrete
pass/fail data point that resolves a 2-session-old hypothesis.
Doesn't need the debugger.
Three categories — pick the verification run as the main bet, treat
the others as fallback if the main bet hits an early blocker:
| # | Tests | Source | Notes |
|---|---|---|---|
| **D-verify** T17 verification run (PRIORITY) | T17 | session 15 migration | Run T17 against the dev box. If pass, log it. If fail, classify the new failure mode. **Side effect: kills any running Claude (the user's, or leaked test cruft). Flag in the report.** Doesn't need the debugger. |
| **C** Schema-rev for `listRemotePluginsPage` / `listSkillFiles` | Bundle grep | session 9 schema-rev pattern | Both methods rejected every smoke-tested arg shape during session 12's investigation. `listRemotePluginsPage` needs `limit: number` at position 0 (rejection: `Argument "limit" at position 0 ...`); `listSkillFiles` needs both `pluginId` and `skillName` (rejection: `Argument "skillName" at position 1 ...`). Bundle-grep on the rejection literals → resolve the schema → ship a narrowly-scoped Tier 2 invocation if it unblocks a case-doc claim. **Tractable against a /login isolation since validators run auth-independent.** |
| **STOP** Orchestrator stop recommendation | n/a | session 15 productivity signal | Coverage at 97%, three consecutive non-coverage sessions, remaining categories soft- or hard-blocked. If D-verify and C both produce nothing tractable, formally recommend the orchestrator stop. Documentation-only sessions are still acceptable per the followup termination criteria, but consecutive ones with no improvement signal are noise. |
#### Category D-verify — T17 verification run
The plan: run the post-session-15 T17 against the dev box and capture
the result. Pass = the structural fix landed correctly. Fail = the
hypothesis was incomplete; classify and decide.
1. **Disambiguate running Claude processes.** `pgrep -af
"ozone-platform=x11.*app.asar"`; for each, `cat
/proc/<pid>/cmdline | tr '\0' '\n' | grep user-data-dir` (or
inspect via `ps` cmdline). If only `/tmp/claude-test-*`
user-data-dirs, no real Claude is running.
2. **Run T17.** `cd tools/test-harness && npx playwright test
T17_folder_picker --reporter=list 2>&1 | tee
/tmp/t17-session16.log`.
3. **Classify.**
- Pass: structural fix verified. Update plan-doc / matrix.
- Skip with "seedFromHost unavailable": means host has no
`~/.config/Claude/Local State`. Should be rare on the dev
box but possible if config was wiped between sessions.
- Skip with "seeded auth did not reach post-login URL":
auth was seeded but stale. User needs to re-sign-in
manually. Don't try to reseed automatically.
- Fail with NEW failure mode: classify the failure (AX
click? openFolderPicker chain? dialog mock?). If it's
now in `openPill` / `clickMenuItem`, sessions 14/15's
speculation has finally hit; ship the AX migration.
Otherwise document and defer.
4. **Don't restructure T17's body** unless step 3 surfaces a
real new bug. Keep changes scoped to whatever the verification
surfaces.
Doesn't need the debugger.
#### Category C — Schema-rev for rejecting read-sides
The plan: resolve the validator schema for `listRemotePluginsPage` /
`listSkillFiles` via bundle grep, ship invocations if either unblocks
a case-doc claim. Tractable against a /login isolation since
validators run auth-independent.
1. **Grep on the rejection literal** in the bundled `index.js`.
Validator block sits ~50-200 chars before the throw site (session
9 finding). Read ~2KB around the hit to surface the full schema.
2. **Smoke-test the recovered schema** against the user's debugger-
attached running Claude (or, if auth-soft-blocked as in session 15,
against the /login isolation — validators run regardless of auth).
3. **Connect the resolved invocation to a case-doc claim.**
4. **Ship a Tier 2 invocation** if a case-doc claim is unblocked.
Auth-independent for the validator; auth-bearing for any handler that
actually returns plugin / skill data. If the validator resolves but
the handler fails on auth, document the schema in plan-doc as a
deferred reframe and move on.
#### STOP recommendation
If D-verify resolves cleanly (pass or stable skip) and C produces no
shippable spec after the schema-rev investigation, the productivity
signal for further sessions is squarely "documentation-only with no
clear next-step deliverable." The orchestrator should stop. State
this plainly in the final report; don't keep cycling.
### Constraints to respect (don't violate)
These are unchanged from sessions 1-15 and still load-bearing:
- **Default isolation** unless the spec needs otherwise. Use
`seedFromHost: true` for any test that depends on authenticated
renderer state — never assume default isolation gets past
`/login`. T07/T11_runtime/T16/T17/T19/T20/T21/T26/T22b/T27/T31b/T33b/T33c/T35b/T37b/T38b
are the templates. **T17 was migrated to this shape in session 15.**
- **eipc handlers register on `webContents.ipc._invokeHandlers`,
NOT global `ipcMain._invokeHandlers`.** Session 7 finding. Use
`lib/eipc.ts` rather than rolling a new walker.
- **eipc invocation goes through the renderer-side wrapper at
`window['claude.<scope>'].<Iface>.<method>`.** Session 8 finding.
Use `lib/eipc.ts`'s `invokeEipcChannel` rather than rolling
main-side direct calls.
- **For arg validator schema-rev: try smoke-test first, then grep
the rejection message literal.** Session 9 finding.
- **For AX-tree consumers: use `lib/ax.ts`.** Session 13 finding.
`snapshotAx` for one-shot reads, `waitForAxNode` /
`waitForAxNodes` for predicate-based polling.
- **For call-site migrations to `waitForAxNode`: keep the per-spec
retry budgets matching the existing tuning.** Session 14
finding. Migration is shape-only EXCEPT when the call-site has
NO retry at all — adding a budget is the bug-fix the migration
delivers.
- **For test specs that depend on host auth: use `seedFromHost:
true`.** Session 15 finding. The legacy `CLAUDE_TEST_USE_HOST_CONFIG=1`
/ `isolation: null` shape collides with Playwright's 60s spec
timeout when the env var isn't set; `seedFromHost` gives a clean
skip-or-pass shape. T17 was the last spec on the legacy shape.
- **`lib/input.ts` is X11-only.** Strict gate.
- **`lib/input-niri.ts` is Niri-only.** Strict gate.
- **CDP auth gate is alive** — runtime SIGUSR1 attach via
`app.attachInspector()`, never Playwright's `_electron.launch()`
or `chromium.connectOverCDP()`.
- **BrowserWindow Proxy gotcha** — use
`webContents.getAllWebContents()` not
`BrowserWindow.getAllWindows()`.
- **`skipUnlessRow()` always first.**
- **No fixed sleeps.** `retryUntil` from `lib/retry.ts`, or
Playwright auto-wait, or `waitForAxNode` from `lib/ax.ts`.
- **Diagnostics on every run.** `testInfo.attach()` the artefacts.
- **Tag with annotations.** `severity:` and `surface:` on every
test so JUnit carries them through to matrix-regen.
- **Tabs in TS, ~80-char wrap as the existing files do.**
- **Don't break existing runners.** `npm run typecheck` must stay
clean. H01-H05 are the canaries; `npm test` must still pass them
after every commit. Note that T07 / S25 / S29-S31 / S04 etc.
may be pre-existing-flaky on KDE-W — they're NOT canaries;
baseline failures don't block work.
- **Always grep the installed asar** to verify a fingerprint
string is present.
### Phases
#### Phase 0 — calibration
### Phase 0 — calibration (mandatory before fanning out)
1. `cd tools/test-harness && npm run typecheck` — should pass.
2. **Check debugger ATTACHMENT QUALITY (not just port).** First
`ss -tln 2>/dev/null | grep ':9229'`. If port open, also probe
webContents via `evalInMain` (see "Big new findings" §3 for
the probe shape). If every URL is `/login` /
`find_in_page` / `main_window`, treat as soft-blocked.
3. **Disambiguate running Claude processes.** Required before any
`seedFromHost` spec. `pgrep -af "ozone-platform=x11.*app.asar"`
+ cmdline inspection for user-data-dir.
4. Read the plan doc's "Status (post-execution)" session 15 section,
then read T17's session-15 form and the seedFromHost convention.
5. Pick the main bet:
- **D-verify** (PRIORITY): run T17, classify the result.
- **C**: bundle grep on rejection literals, schema-rev,
smoke-test the resolved shape against the /login isolation.
- **STOP**: if both above produce nothing tractable, recommend
stopping the orchestration.
2. Check debugger ATTACHMENT QUALITY (not just port). `ss -tln |
grep ':9229'`. If port open, probe webContents via `evalInMain`:
If Phase 0 surfaces a problem (typecheck failing, primitives unclear,
the chosen Category's prerequisites don't hold), stop and report.
Don't fan out.
```ts
import { InspectorClient } from './src/lib/inspector.js';
const client = await InspectorClient.connect(9229);
const wcs = await client.evalInMain<unknown>(`
const { webContents } = process.mainModule.require('electron');
return webContents.getAllWebContents().map((w) => ({
id: w.id, url: w.getURL(), title: w.getTitle(),
}));
`);
console.log(wcs); client.close();
```
#### Phase 1 — fan-out batch
For Category D-verify (T17 run):
- Single subagent (or do directly — it's a single-command run +
trace inspection) runs T17 and classifies. Verify by checking
pass/skip/fail and any new failure-mode trace.
For Category C (schema-rev):
- Single subagent does bundle-grep on the rejection literals,
surfaces the validator schemas, smoke-tests the recovered shapes
against the user's debugger-attached running Claude (or /login
isolation if soft-blocked).
Cap at ~1 spec OR ~1 verification + 1 schema-rev — same scope as
sessions 9-15.
#### Per-subagent prompt shape
```
You're implementing ONE [verification run | primitive migration |
investigation] for <TARGET>.
Read in order:
- docs/testing/cases/<FILE>.md (focus on <TARGET>'s Code anchors)
- tools/test-harness/README.md (conventions; status section names
the most-recent-template that fits)
- tools/test-harness/src/runners/<closest-template>.spec.ts
- tools/test-harness/src/lib/ (the primitives you'll reuse —
including session 13's `lib/ax.ts` and session 15's seedFromHost
T17 migration)
- CLAUDE.md (project conventions)
[per-task specifics: pattern (verification run / mock-then-call /
asar fingerprint / shared isolation / new-primitive-build /
investigation / call-site migration), assertion shape, skip rules,
key constraint warnings]
Constraints:
- Tabs, ~80-char wrap.
- Use lib/* primitives; don't reinvent.
- testInfo.attach() the diagnostics from the spec's "Diagnostics
on failure" block.
- Tag with severity + surface annotations.
- No fixed sleeps. retryUntil, Playwright auto-wait, or
waitForAxNode.
- npm run typecheck must stay clean after your edits.
- Don't commit. The user reviews and commits.
If the target isn't reasonable to implement (anchors don't resolve
to anything assertable, the test depends on state you can't
construct, the existing primitives don't cover the surface), DO
NOT write a stub. Report under Open questions and stop.
Report shape (~150 words):
## <TARGET> [verification | primitive | investigation | migration]
- File written: tools/test-harness/src/runners/<filename>.spec.ts
[or lib/<newfile>.ts or modified lib/<existing>.ts]
- Layer: file probe | argv probe | L1 | L2 (xprop) | L2 (DBus) |
pgrep | new-primitive | investigation | migration | verification
- Assertion shape (or migration shape): <one sentence>
- Skip rules: <which rows + why>
- Verification path: <typecheck + run result>
- Open questions: <caveats>
```
#### Phase 2 — synthesis
After fan-out returns:
1. `cd tools/test-harness && npm run typecheck` — must stay clean.
2. Run the new / migrated runners against KDE-W (the dev box) — but
flag the user first if any are destructive (seedFromHost kills
running Claude). Capture pass/skip/fail per spec for the matrix.
3. Update [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
"Status (post-execution)" section to reflect newly-shipped
specs / primitive migrations and any reclassifications.
4. Update [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
inventory table.
5. Write a final report listing:
- Specs landed / migrations completed (pass / skip / needs-tuning per row)
- Primitives landed (with API shape)
- Specs deferred (with the per-test rationale)
- Specs reclassified (Tier 3 → Tier 2, Tier 2 → Tier 1, etc.)
- Updated coverage stat (was 74/76 = 97%, now N/76 = M%)
6. Commit and push to `docs/compat-matrix` (the orchestration
directive at the top of the followup supersedes "don't commit").
7. Rotate this prompt: rewrite
`docs/testing/runner-implementation-followup-prompt.md` for
the NEXT session's deferred items.
### Self-correction loop
Same as sessions 1-15:
1. Subagent typecheck failure → re-spawn with explicit fix
instruction.
2. Subagent claims a runner / migration exists but `git status`
shows no new file → re-spawn with explicit "use the Write tool"
instruction.
3. Two subagents wrote runners that share a primitive but with
different shapes → factor into `lib/<topic>.ts` BEFORE shipping.
4. Spec passes locally but the assertion is actually trivial → re-
examine the assertion shape.
5. Migration breaks an existing spec → roll back the migration; the
per-spec retry budget was load-bearing and the primitive
defaults didn't match.
6. **Carry-over from sessions 5-15:** If the chosen Category's
investigation doesn't resolve / requires schema-rev that exceeds
budget after 2-3 approaches, STOP. Don't keep digging — pivot
to a fallback Category. Document what was tried.
7. **Carry-over from session 10:** If a registration probe surfaces
"registered but uninvocable", document and defer rather than
building the main-side fallback speculatively.
Cap re-spawns at 2 per file. Past that, mark as needing human
review and move on.
### Termination conditions
Stop and write the final report when one of:
1. **Main-bet Category target landed and typecheck-clean.** Write
coverage update, stop.
2. **Hit re-spawn cap on 2+ tasks.** Stop, write up which are
blocked.
3. **Discovered a primitive gap that breaks 5+ Tier 2/Tier 3
tests.** Stop, propose where the new primitive should live in
`lib/`. Future session adds the primitive first, then resumes.
4. **Session budget hits ~1 verification + 1 schema-rev landing.**
Stop, synthesize, leave the rest for the next session.
5. **All categories blocked / unproductive after 2-3 attempts
each.** Document the findings as plan-doc additions, **and
recommend the orchestrator stop the campaign** — coverage at
97%, three+ consecutive non-coverage sessions, dimming
productivity signal.
### What you should NOT do
- **Don't try to land D-verify + C in one batch.** Pick D-verify
first; if that resolves cleanly, take C as a stretch goal.
- **Don't ship stubs.** If a runner can't actually assert what the
spec says, mark it as Tier 3 / blocked / primitive-gap and
don't write a placeholder.
- **Don't break existing runners.** H01-H05 are the canaries.
- **Don't restructure `lib/`** beyond targeted additions.
Premature abstractions are wrong abstractions.
- **Don't run destructive Tier 3 tests** that write to the user's
real claude.ai account.
- **Don't introspect `ipcMain._invokeHandlers` for `claude.web`
eipc channels.** Use `lib/eipc.ts`.
- **Don't call `invokeEipcChannel` for write-side handlers.**
- **Don't bolt other compositors into `lib/input-niri.ts`.**
- **Don't bolt Wayland into `lib/input.ts`.**
- **Don't speculate on a `lib/input-wayland.ts` dispatcher.**
- **Don't preemptively build `CodeTab.activateTopTab()` /
`startNewSession()`.**
- **Don't add a main-side `invokeEipcChannel` fallback
speculatively.**
- **Don't speculate on a Launch event-subscription primitive.**
- **Don't extract T07's CSS-querySelector poll into `lib/ax.ts`.**
That's a different abstraction (DOM, not AX). Wait for a second
CSS-poll consumer before extracting.
- **Don't add a `waitForRenderedSurface(client, surfaceKey)`
registry to `lib/ax.ts`.** Session 13 deliberately deferred
this — wait for a third consumer with a specific named surface.
- **Don't migrate `openPill` / `clickMenuItem` to `waitForAxNode`
speculatively.** Session 15 confirmed T17's flake didn't need
it; without a third consumer signal, it's premature optimisation.
- **Don't reach into `explore/walker.ts` for AX types/helpers.**
`lib/ax.ts` re-exports — use those.
- **Don't implement the #569 power-inhibit patch in this
session.** That's a separate workstream.
- **Don't keep cycling on documentation-only sessions.** If
D-verify and C both turn up empty, formally recommend the
orchestrator stop the campaign rather than burning another
session of compute on marginal output.
### Final report format
```markdown
## Runner implementation summary (session 16)
- Main-bet category: D-verify | C | STOP
- Specs landed: N
- Migrations completed: N
- Primitives landed: N
- Verifications run: N
- Reclassified mid-flight: N (with reasons)
- Coverage: was 74/76 (97%), now <NEW>/76 (<PCT>%)
- Typecheck: clean | <errors>
- KDE-W test run: <pass/skip/fail counts>
## Per-spec breakdown
| Cat | Test ID | File | Assertion shape | Status |
|---|---|---|---|---|
| D-verify | T17 | T17_folder_picker.spec.ts | … | ✓ pass / skip / fail |
| ... |
## Notable findings
- ...
## Open questions
- ...
## Stop recommendation
- Yes / no, with rationale.
## Files touched
git status output.
## Diff summary
git diff --stat
```
If every URL is `/login` / `find_in_page` / `main_window`, treat
as soft-blocked for auth-required investigations.
3. Disambiguate running Claude processes. `pgrep -af
"ozone-platform=x11.*app.asar"`; for each, inspect cmdline for
`user-data-dir`. Real Claude has
`~/.config/Claude` (or no user-data-dir flag); leaked test
isolations have `/tmp/claude-test-*`.
4. **Verify at least one precondition for resuming the orchestration
holds.** If none hold, write a "no preconditions met" plan-doc
update and STOP. Don't fan out.
### Operational notes
- Subagents are launched in parallel via a single message with
multiple Agent tool calls. Don't serialise.
- Each subagent's Write calls land directly in the working tree.
- The grounding probe (`tools/test-harness/grounding-probe.ts`)
can help when implementing a runner that asserts runtime API
state.
- The eipc-registry probe (`tools/test-harness/eipc-registry-probe.ts`)
is the dedicated tool for inspecting per-wc IPC handler state.
Connects to a debugger-attached running Claude on port 9229.
- For seedFromHost specs, the host MUST have a signed-in Claude
Desktop. The primitive throws with a clear message if not.
- For tests that touch the AX tree, **`lib/ax.ts`** is the shared
substrate.
- For mock-then-call: helpers live in `lib/electron-mocks.ts`.
- For focus-shifting (X11 only): `lib/input.ts` exports
`focusOtherWindow` + `spawnMarkerWindow`.
- For Wayland-native focus-shifting (Niri only): `lib/input-niri.ts`.
- For eipc registry walking: `lib/eipc.ts` exports
`getEipcChannels` / `findEipcChannel` / `findEipcChannels` /
`waitForEipcChannel` / `waitForEipcChannels`.
- For eipc invocation: `lib/eipc.ts` exports `invokeEipcChannel`.
Only call read-side suffixes; the primitive doesn't enforce a
read-only allowlist.
- **For arg validator schema-rev (sessions 9 / 11 / 12 findings):**
smoke-test first, bundle-grep on rejection literal as fallback.
- **For session-scoped Tier 2 reframes (session 10 finding):**
`LocalSessions/getAll` foundational read-side surrogate.
- **For Tier 2 reframes with case-doc-anchored read-side handlers
(session 11 finding):** invoke directly. Mixed-shape OK.
- **For Tier 2 reframes spanning two interfaces (session 12
finding):** invoke a read-side from each impl object.
- **For AX-tree polling (session 13 finding):** `lib/ax.ts`'s
`waitForAxNode` / `waitForAxNodes` for predicate-based polling.
- **For call-site migrations to `waitForAxNode` (session 14
finding):** keep per-spec retry budgets matching the existing
tuning.
- **For auth-required spec migrations (session 15 finding):**
use `seedFromHost: true`, NOT `CLAUDE_TEST_USE_HOST_CONFIG=1` /
`isolation: null`. The legacy shape collides with Playwright's
60s spec timeout.
- **For asar fingerprints: ALWAYS grep the installed asar
first.** Build-reference is beautified; the bundle is
minified.
- For the bundle-grep schema-rev pattern (sessions 9, 11, 12, 16
precedents):
```bash
cd tools/test-harness && node -e "
const {extractFile} = require('@electron/asar');
@@ -648,10 +218,21 @@ git diff --stat
'.vite/build/index.js'
);
const s = buf.toString('utf8');
for (const k of ['<your-needle>', '<another>']) {
console.log(k, '->', s.split(k).length - 1);
}
const idx = s.indexOf('<rejection-literal>');
console.log(s.slice(Math.max(0, idx - 1500), idx + 500));
"
```
Begin with Phase 0. Don't fan out until calibration succeeds.
- For seedFromHost specs: host MUST have a signed-in Claude.
`seedFromHost`'s host-claude-kill semantics will tear down any
running Claude process — flag clearly in the report before
invoking when the user's real Claude is running.
- For AX-tree polling: `lib/ax.ts`'s `waitForAxNode` /
`waitForAxNodes` for predicate-based polling.
- The eipc-registry probe (`tools/test-harness/eipc-registry-probe.ts`)
is the dedicated tool for inspecting per-wc IPC handler state.
Begin with Phase 0. Don't fan out until at least one of the
preconditions for resuming the orchestration is verified to hold.

View File

@@ -18,6 +18,140 @@ work begins.
## Status (post-execution)
**Shipped session 16 (verification + schema-rev investigation, no new spec):**
T17's session-15 `seedFromHost` migration verified end-to-end against
the dev box: the bare 60s Playwright timeout is GONE, `seedFromHost`
clones the host's signed-in config, `waitForReady('userLoaded')`
resolves to `https://claude.ai/epitaxy` (post-login), the dialog mock
installs, and `CodeTab.activate({ timeout: 15_000 })` (session 14
migration) succeeds. T17 reaches a NEW failure mode at the next chain
step: `CodeTab.openFolderPicker: "Select folder…" pill did not open
within 4s after Local was clicked` — the env-pill open + Local click
both succeed, but the Select-folder pill doesn't render in the URL
state we reach (`/epitaxy`, the user's workspace, NOT `/new`). Per the
session-15 followup classification rules: this is NOT in `openPill` /
`clickMenuItem`'s post-click loops (those work — the env pill opened
and Local was found and clicked); the failure is one chain step later,
likely renderer-state-dependent (the workspace route doesn't expose a
local-folder picker the same way `/new` does). Don't migrate
`openPill` / `clickMenuItem` speculatively — that's been the standing
deferral since session 14. Document and defer the new failure mode.
Category C schema-rev (`listRemotePluginsPage` / `listSkillFiles`)
**resolved** by bundle inspection of
`/usr/lib/claude-desktop/node_modules/electron/dist/resources/app.asar`
(extracted via `@electron/asar`):
- `CustomPlugins.listRemotePluginsPage(limit: number, offset: number)`
— both positional, both numbers. Validator block sits at
`'$eipc_message$_..._$_claude.web_$_CustomPlugins_$_listRemotePluginsPage'`,
with explicit `typeof r!="number"` / `typeof n!="number"` checks
preceding the throw. Result validator `VUi(s)`.
- `LocalPlugins.listSkillFiles(pluginId: string, skillName: string,
pluginContext?: opaque)` — two required strings + optional context
arg validated by `sc(s)` (the same shared validator used elsewhere
for plugin-context blobs). Result validator `bUi(o)`.
**No Tier 2 invocation shipped for either** because neither method
connects to a case-doc claim:
- `listRemotePluginsPage` is NOT anchored in any case doc. T33 anchors
`listMarketplaces` (`:71392`) and `listAvailablePlugins` (`:71534`)
— both already covered by T33b/T33c — but `listRemotePluginsPage`
is a separate read-side surface (paginated remote-plugin list) that
the case docs don't claim. Shipping a probe just to exercise the
validator with no assertion bound to a real-product behaviour would
be a stub.
- `listSkillFiles` is `LocalPlugins`-scoped and meaningful only with
an installed plugin (T11 step 3: "verify its skills appear in the
slash menu"). Reaching that requires the destructive Tier 3 install
path, which the constraints explicitly forbid. The validator
resolves auth-independent, but the underlying handler needs real
account state.
Schemas captured in plan-doc as a deferred reframe so a future session
with a real-account install fixture can ship the invocation.
Coverage stays at 74/76 (97%) — verification + investigation, no spec
landed.
Two commits on `docs/compat-matrix` expected (the orchestration
directive supersedes "the user reviews and commits" — autonomous
commit + push at end of session):
- TBD — `test(harness): session 16 verify T17 seedFromHost fix +
schema-rev for listRemotePluginsPage / listSkillFiles (no spec,
coverage unchanged at 97%)` (no code change beyond the doc updates;
T17 verification run + schema-rev bundle inspection captured in
the plan-doc).
- TBD — `docs(testing): session 16 plan/inventory + flag orchestrator
STOP for session 17`.
Session 16 findings + reclassifications:
- **Session 15's structural T17 fix VERIFIED.** The pre-fix bare 60s
timeout was real and is gone. `seedFromHost` clones host config,
the renderer reaches a post-login URL, mocks install, and tab
activation succeeds. Session 14's `activateTab` /
`CodeTab.activate` AX migration also verified — `activate({
timeout: 15_000 })` resolved on the FIRST run with no flake.
- **T17's NEW failure mode classified as renderer-state, not AX.**
Post-`selectLocal` the Select-folder pill never appeared; this is
upstream of `openPill`'s click loop (the env pill opened, Local
was clicked successfully). The trace shows the URL is
`https://claude.ai/epitaxy` — the user's workspace route, not
`/new`. The folder-picker UI may only render on `/new` (or a
fresh project), not on a workspace already containing files.
Future fix: navigate to `/new` post-userLoaded before invoking
`openFolderPicker`. NOT shipped this session — needs a careful
navigation primitive that doesn't break existing seedFromHost
specs.
- **`openPill` / `clickMenuItem` migration STILL parked.** Sessions
14/15 speculated about migrating these; session 15 walked it back;
session 16 confirms session 15's call. The new T17 failure is one
chain step later, NOT in the post-click polling loops.
- **Schema-rev cleanly resolved both deferred validators.** Session 9
pattern (bundle-grep on the rejection literal) works as expected.
No smoke-test was needed because the validator literal IS the
schema source of truth (typeof checks are explicit in source).
Smoke-test against a live debugger-attached Claude wasn't possible
this session because T17's seedFromHost step killed the leaked
isolations and tore down the debugger.
- **No case-doc connection for either resolved schema.**
`listRemotePluginsPage` is paginated remote-plugin enumeration
(a separate surface from T33's `listMarketplaces` /
`listAvailablePlugins` already covered). `listSkillFiles` needs
real account state via a Tier 3 install. Both are documented for
future revisit, neither shipped as a runner.
- **Three Tier 4 blockers crystallised.** Sessions 13-16 collectively
confirm the remaining un-runner'd specs all sit behind one of:
(a) write-side state on a real claude.ai account (Tier 3
destructive — explicitly forbidden); (b) renderer-state-dependent
UI that the harness can't construct without account-side fixtures
(T17's `/new` requirement); (c) auth-bearing debugger-attached
Claude that exists only when a real signed-in app is running on
the dev box (which the session-13 onwards sessions have been
unable to keep alive across orchestration runs because seedFromHost
kills it). At 74/76 (97%), the structural ceiling for the harness
is reached; the remaining 2 specs need real-account write-side
fixtures.
**ORCHESTRATION-LEVEL STOP RECOMMENDATION (session 16 final).**
Sessions 13-16 produced: 1 primitive (`lib/ax.ts` — session 13), 1
substantive AX migration (`activateTab` + `CodeTab.activate` —
session 14), 1 structural fix (T17 seedFromHost — session 15), 1
verification + 1 schema-rev investigation (session 16). NO coverage
gain across 4 sessions. Coverage start 74/76 → end 74/76 (97%
throughout). The structural ceiling is reached. Future sessions
should be triggered manually — only when (a) the user has a real
signed-in Claude they're willing to dedicate to a debugger-attached
session, or (b) a new test-harness primitive opportunity surfaces
from product changes (e.g. claude.ai renderer drift requiring
refactoring, new IPC surfaces requiring registry walking). The
autonomous orchestration is being stopped after session 16.
---
**Shipped session 15 (1 structural fix, no new spec, no AX migration):**
T17 migrated from the legacy `CLAUDE_TEST_USE_HOST_CONFIG=1` /
`isolation: null` auth path to the canonical `seedFromHost: true`

View File

@@ -140,7 +140,14 @@ T27, T31b, T33b, T33c, T35b, T37b, T38b — session 15 migrated T17
from the legacy `CLAUDE_TEST_USE_HOST_CONFIG=1` / `isolation: null`
shape to `seedFromHost`, fixing a pre-existing 60s spec-timeout
flake where the unauth'd default isolation polled `userLoaded` past
Playwright's spec budget).
Playwright's spec budget; session 16 verified the migration end-to-
end — `seedFromHost` clones the host's signed-in config,
`waitForReady('userLoaded')` resolves to a post-login URL, and the
session-14 `CodeTab.activate({ timeout: 15_000 })` succeeds; T17
now reaches a NEW failure mode at the next chain step
(`openFolderPicker` after `selectLocal`, `Select folder…` pill
doesn't render on `/epitaxy` workspace route — likely needs `/new`
context, deferred for a future session).
Note on eipc channels: the `LocalSessions_$_*` and `CustomPlugins_$_*`
channel names referenced in the case-doc Code anchors don't register