mirror of
https://github.com/aaddrick/claude-desktop-debian.git
synced 2026-05-17 00:26:21 +03:00
docs(testing): session 16 verify T17 seedFromHost + schema-rev for listRemotePluginsPage / listSkillFiles + flag orchestrator STOP for session 17
Final session of the sessions-13-to-16 autonomous orchestration run.
Verified session 15's T17 seedFromHost migration end-to-end against
the dev box: bare 60s Playwright timeout is GONE, seedFromHost clones
host config, waitForReady('userLoaded') resolves to a post-login URL
(https://claude.ai/epitaxy), dialog mock installs, and the session-14
CodeTab.activate({ timeout: 15_000 }) AX migration succeeds first try.
T17 reaches a NEW failure mode at the next chain step
(openFolderPicker after selectLocal — Select-folder pill doesn't
render on /epitaxy workspace route, likely needs /new context).
Classified as renderer-state-dependent, not openPill / clickMenuItem
loop — ruling out sessions 14-15's parked AX migration hypothesis
once and for all. Deferred for a future session (needs careful /new
navigation primitive).
Schema-rev resolved both deferred validators by bundle inspection of
app.asar (no smoke-test possible — T17's seedFromHost step killed the
debugger-attached leaked isolations as expected):
- CustomPlugins.listRemotePluginsPage(limit: number, offset: number)
- LocalPlugins.listSkillFiles(pluginId: string, skillName: string,
pluginContext?: opaque)
Neither shipped as a Tier 2 invocation — listRemotePluginsPage is
not anchored in any case doc (T33 anchors listMarketplaces +
listAvailablePlugins, both already covered by T33b/T33c);
listSkillFiles is meaningful only with an installed plugin, which
needs Tier 3 destructive setup explicitly forbidden by the
constraints. Schemas captured in plan-doc as a deferred reframe.
Coverage stays at 74/76 (97%) — verification + investigation, no
spec landed.
Orchestration-level summary (sessions 13-16):
- Coverage start 74/76 (97%) → end 74/76 (97%) — NO net coverage
gain across 4 sessions
- Net deliverables: 1 primitive (lib/ax.ts, session 13), 1 AX
migration (activateTab + CodeTab.activate, session 14, fixed T16
pre-existing-flake), 1 structural fix (T17 seedFromHost, session
15, verified working session 16), 1 verification + 1 schema-rev
investigation (session 16)
- Why coverage stalled: structural ceiling reached. Remaining 2
specs need real claude.ai account write-side state which the
harness can't construct without violating the Tier 3 destructive
constraint.
Followup prompt rotated for session 17 with a STOP flag at the top —
session 17 will only run if the user manually triggers another
orchestration AND at least one of four preconditions holds (real
signed-in debugger-attached Claude, real-account write-side fixture,
renderer-drift event, or new IPC surface).
Co-Authored-By: Claude <claude@anthropic.com>
This commit is contained in:
@@ -1,645 +1,215 @@
|
||||
# test-harness runner implementation — session 16 prompt
|
||||
# test-harness runner implementation — session 17 prompt
|
||||
|
||||
This file is meant to be **copied verbatim into a fresh Claude Code
|
||||
session** as the initial user message. Don't paraphrase it; the
|
||||
orchestration depends on the exact directives below.
|
||||
|
||||
You're picking up after a runner-implementation session that landed 1
|
||||
structural fix (T17 migrated from legacy `CLAUDE_TEST_USE_HOST_CONFIG=1`
|
||||
auth path to `seedFromHost: true`, no new spec, no AX migration).
|
||||
Session 15 was an investigation session: Phase 0 calibration found
|
||||
port 9229 listening BUT the attached process was a leaked test
|
||||
isolation at `claude.ai/login` rather than the user's auth-bearing
|
||||
Claude — every webContents URL on that process was either `find_in_page`,
|
||||
`/login`, or `main_window/index.html`, and the user-data-dir was
|
||||
`/tmp/claude-test-*`. That made Categories A (operon-mode probe) / B
|
||||
(Tier 3 read-only reframes) / C (schema-rev) all soft-blocked: the
|
||||
debugger was technically attached, but to the wrong process for any
|
||||
auth-required investigation. Session 15 pivoted to investigating T17's
|
||||
pre-existing flake (the PRIORITY directive) and discovered the failure
|
||||
was structural rather than AX-polling-related — the spec was using the
|
||||
legacy `CLAUDE_TEST_USE_HOST_CONFIG=1` / `isolation: null` shape, and
|
||||
when run without that env var fell through to a fresh isolation with no
|
||||
auth, where `waitForUserLoaded`'s 90s default budget gets preempted by
|
||||
Playwright's 60s spec timeout. Coverage unchanged at 74/76 (97%) —
|
||||
structural fixes don't move the spec count, but T17 should now succeed
|
||||
when host is signed in (rather than auto-failing with a bare 60s
|
||||
timeout). Two commits on `docs/compat-matrix` expected (autonomous
|
||||
orchestration commits + pushes — the user reviews after the session):
|
||||
> **ORCHESTRATION STOPPED AFTER SESSION 16.** This prompt is rotated
|
||||
> for completeness only. **Session 17 will NOT run automatically** —
|
||||
> the autonomous orchestration was halted at the end of session 16
|
||||
> after coverage stalled at 74/76 (97%) for four consecutive sessions
|
||||
> (13, 14, 15, 16). To resume, the user must manually trigger another
|
||||
> orchestration run AND meet at least one of these preconditions:
|
||||
>
|
||||
> 1. **Real signed-in Claude Desktop running with `--inspect=9229`**
|
||||
> on the dev box (debugger-attached, signed in, NOT a leaked test
|
||||
> isolation). This unblocks Categories A (operon-mode probe) and
|
||||
> B (Tier 3 read-only reframes that need auth-bearing renderer
|
||||
> state).
|
||||
> 2. **A real claude.ai account fixture for write-side state.** The
|
||||
> remaining 2 specs (matrix coverage 74/76 → 76/76) need real
|
||||
> write-side state (e.g. an installed plugin to exercise
|
||||
> `LocalPlugins.listSkillFiles`, or a deep-linked deferred install
|
||||
> intent for T11). The Tier 3 destructive constraint
|
||||
> (`Don't run destructive Tier 3 write-side tests`) explicitly
|
||||
> forbids the harness constructing this state itself.
|
||||
> 3. **Renderer-drift event** that requires re-anchoring page-objects
|
||||
> (e.g. claude.ai redesign breaks `findCompactPills`,
|
||||
> `clickMenuItem`, etc.). Triggers a defensive-migration session.
|
||||
> 4. **New IPC surface** added by upstream that the harness should
|
||||
> cover (e.g. a new `claude.web` interface, a new eipc method
|
||||
> that's case-doc-anchored).
|
||||
>
|
||||
> If none of those preconditions hold, the orchestration should NOT
|
||||
> resume — further sessions will produce documentation-only or
|
||||
> marginal output. The structural ceiling of the harness without
|
||||
> real-account fixtures is 74/76 (97%); we're already there.
|
||||
|
||||
- TBD — `test(harness): session 15 migrate T17 to seedFromHost +
|
||||
prune unused RawElement import (no spec, coverage unchanged at 97%)`
|
||||
(T17 spec rewrite swapping the `CLAUDE_TEST_USE_HOST_CONFIG=1` +
|
||||
`isolation: null` branch for the canonical `seedFromHost: true`
|
||||
pattern; prunes unused `RawElement` re-export import in
|
||||
`lib/claudeai.ts` per session 14's leftover hint; typecheck clean;
|
||||
T17 not actually run this session — see below).
|
||||
You're picking up after session 16 of the test-harness runner
|
||||
implementation work. Session 16 was the final session of the
|
||||
sessions-13-to-16 orchestration run and produced: T17 verification
|
||||
(session-15 structural fix VERIFIED — bare 60s timeout gone, new
|
||||
failure mode at `openFolderPicker` post-`selectLocal` classified as
|
||||
renderer-state-dependent and deferred), schema-rev for
|
||||
`listRemotePluginsPage` / `listSkillFiles` (both schemas resolved by
|
||||
bundle inspection — neither shipped as a Tier 2 invocation because
|
||||
`listRemotePluginsPage` is not anchored in any case doc, and
|
||||
`listSkillFiles` needs Tier 3 destructive setup). NO coverage gain.
|
||||
Plan-doc updated. Followup-prompt rotated with the STOP flag (this
|
||||
document).
|
||||
|
||||
The plan doc at
|
||||
[`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
|
||||
captures the tier classification and execution-time reclassifications.
|
||||
Its "Status (post-execution)" section is the source of truth for
|
||||
what's done and what's deferred — read **session 15** first, then
|
||||
**session 14**, then **session 13**, then **session 12**, then
|
||||
**session 11**, then **session 10**, then **session 9**, then **session
|
||||
8**, then **session 7**, then **session 6**, then **session 5**, then
|
||||
**session 4**, then **session 3**, then **session 2**, then **session
|
||||
1** sub-sections.
|
||||
what's done and what's deferred — read **session 16** first, then
|
||||
**session 15**, **session 14**, **session 13**, **session 12**,
|
||||
**session 11**, **session 10**, **session 9**, **session 8**,
|
||||
**session 7**, **session 6**, **session 5**, **session 4**, **session
|
||||
3**, **session 2**, then **session 1** sub-sections.
|
||||
|
||||
This session is a continuation, not a restart. Start by reading the
|
||||
plan doc's status sections.
|
||||
plan doc's status sections AND verifying at least one of the
|
||||
preconditions above holds. If none hold, STOP and report; don't try
|
||||
to fan out.
|
||||
|
||||
### Big new findings from session 15
|
||||
### Session 16 final findings (key context for any session-17 attempt)
|
||||
|
||||
1. **T17 flake was structural, not AX-polling.** The trace showed
|
||||
bare 60s Playwright timeout with NO `renderer-url` attachment —
|
||||
meaning the test never reached line 49's attach call, which
|
||||
means it never resolved `waitForReady('userLoaded')` at line 40.
|
||||
Root cause: T17 was the last spec on the legacy
|
||||
`CLAUDE_TEST_USE_HOST_CONFIG=1` / `isolation: null` shape — every
|
||||
other auth-required spec (T07, T16, T19, T20, T21, T22b, T26,
|
||||
T27, T31b, T33b/c, T35b, T37b, T38b) had moved to `seedFromHost:
|
||||
true`. Without that env var (which CI / orchestration didn't
|
||||
set), T17 fell through to a fresh isolation with no auth, hit
|
||||
`/login`, and `waitForUserLoaded`'s 90s budget got preempted by
|
||||
the 60s spec timeout. **Session 14's hypothesis was wrong** —
|
||||
the AX click chain in `openPill` / `clickMenuItem` was never
|
||||
reached, so migrating those wouldn't have fixed anything.
|
||||
2. **`openPill` / `clickMenuItem` migration parked.** With T17's
|
||||
actual flake explained by the auth-path mismatch, there's no
|
||||
remaining flake-evidence pulling for the AX migration that
|
||||
sessions 14-15 considered. `openPill`'s while-loop and
|
||||
`clickMenuItem`'s while-loop work fine when the auth path is
|
||||
correct. Don't migrate speculatively — wait for a third
|
||||
consumer to surface with budget-tuning evidence.
|
||||
3. **Phase 0 must distinguish "port open" from "port attached to
|
||||
user's signed-in Claude".** Session 14 saw port 9229 closed and
|
||||
correctly classified as debugger-detached. Session 15 saw port
|
||||
9229 OPEN but attached to a leaked test isolation at /login —
|
||||
Categories A/B/C still soft-blocked. The right Phase 0 probe:
|
||||
`evalInMain` listing webContents and checking that AT LEAST one
|
||||
URL is `https://claude.ai/<not /login>`. If every webContents is
|
||||
`/login` or `find_in_page` or `main_window`, treat it the same
|
||||
as port-closed for auth-required investigations. Session 15's
|
||||
one-off probe shape (kept inline in the report, deleted after):
|
||||
1. **T17's session-15 structural fix VERIFIED.** Bare 60s timeout is
|
||||
gone. `seedFromHost` clones the host's signed-in config,
|
||||
`waitForReady('userLoaded')` resolves to a post-login URL
|
||||
(`https://claude.ai/epitaxy` on the dev box), the dialog mock
|
||||
installs, and `CodeTab.activate({ timeout: 15_000 })` (session 14
|
||||
migration) succeeds first try.
|
||||
2. **T17's NEW failure mode is renderer-state-dependent, not AX.**
|
||||
After `selectLocal()` clicks the Local menuitem, the Select-folder
|
||||
pill never appears within 4s. The URL during the run was
|
||||
`/epitaxy` — the user's workspace route. The folder-picker UI
|
||||
may only render on `/new` (or a fresh project), not on a workspace
|
||||
already containing files. To unblock: navigate to `/new`
|
||||
post-userLoaded BEFORE `openFolderPicker()`. NOT shipped session
|
||||
16 — needs a careful navigation primitive that doesn't break
|
||||
existing seedFromHost specs.
|
||||
3. **`openPill` / `clickMenuItem` migration STILL parked.** Session
|
||||
16's T17 trace confirmed the env-pill open + Local click both
|
||||
succeeded, ruling out the AX-polling-loop hypothesis once and for
|
||||
all. Don't migrate those speculatively.
|
||||
4. **Schema-rev resolved both deferred validators.**
|
||||
`CustomPlugins.listRemotePluginsPage(limit: number, offset:
|
||||
number)`. `LocalPlugins.listSkillFiles(pluginId: string,
|
||||
skillName: string, pluginContext?: opaque)`. Neither shipped as a
|
||||
Tier 2 invocation: `listRemotePluginsPage` is not anchored in any
|
||||
case doc; `listSkillFiles` needs Tier 3 destructive setup.
|
||||
5. **Coverage stalled at 74/76 (97%) for 4 consecutive sessions.**
|
||||
Sessions 13-16 net deliverables: 1 primitive, 1 AX migration, 1
|
||||
structural fix, 1 verification + 1 schema-rev investigation.
|
||||
Without real-account fixtures, the harness's structural ceiling
|
||||
is 74/76. The remaining 2 specs need real-account write-side
|
||||
state.
|
||||
|
||||
```ts
|
||||
const wcs = await client.evalInMain(`
|
||||
const { webContents } = process.mainModule.require('electron');
|
||||
return webContents.getAllWebContents().map((w) => ({
|
||||
id: w.id, url: w.getURL(), title: w.getTitle(),
|
||||
}));
|
||||
`);
|
||||
```
|
||||
### What a future session 17 might attempt (only if preconditions hold)
|
||||
|
||||
4. **Leaked `/tmp/claude-test-*` dirs accumulating on dev box.**
|
||||
Multiple test isolations from prior sessions have leaked their
|
||||
tmpdirs and (in some cases) their Electron child processes.
|
||||
`ls /tmp/ | grep claude-test` showed several. The session 15
|
||||
T17 spec wasn't run because killing those leaked Electron
|
||||
processes might also kill the user's real running Claude (PID
|
||||
ambiguity from `ps`). A future session can either (a) verify
|
||||
no real Claude is running before invoking T17, or (b) just
|
||||
accept the seedFromHost kill side effect and let the user
|
||||
re-launch Claude after the session.
|
||||
5. **Productivity signal is dimming.** Sessions 13-15 collectively
|
||||
produced one new primitive (`lib/ax.ts`), one substantive AX
|
||||
migration (`activateTab` + `CodeTab.activate`), and one
|
||||
structural fix (T17 seedFromHost). NO coverage gain in those
|
||||
three sessions. The remaining categories without an
|
||||
auth-bearing debugger-attached Claude are mostly exhausted.
|
||||
Next session should prioritise (a) running T17 to verify the
|
||||
seedFromHost fix actually resolves the timeout, and (b) checking
|
||||
whether a Category C schema-rev probe against the leaked /login
|
||||
isolation is tractable (validators don't need auth, only
|
||||
invocation does — worth a 15-min investigation). If both turn
|
||||
up empty, the orchestrator should seriously consider stopping —
|
||||
at 97% coverage with no clear high-leverage shapes left,
|
||||
further sessions are likely to produce documentation-only or
|
||||
marginal-improvement deliverables.
|
||||
If precondition 1 (real signed-in debugger-attached Claude) holds:
|
||||
|
||||
- **Operon-mode probe** (Category A from sessions 13-16). Run
|
||||
`eipc-registry-probe.ts` against the user's Claude with operon mode
|
||||
toggled on/off, capture the diff in registered channels. May
|
||||
surface a new case-doc-coverable handler.
|
||||
- **Schema-rev smoke-test** for the session-16-resolved schemas
|
||||
against the live debugger. `listRemotePluginsPage(limit: 10,
|
||||
offset: 0)` should return an array shape; `listSkillFiles('some-
|
||||
installed-plugin', 'some-skill')` would test the LocalPlugins
|
||||
handler's auth path.
|
||||
|
||||
If precondition 2 (real-account write-side fixture) holds:
|
||||
|
||||
- **T11 runtime invocation.** With an installed plugin in
|
||||
`~/.claude/plugins/`, the post-install state can be probed via
|
||||
`listSkillFiles` and the slash-menu skills would assert the
|
||||
case-doc claim "skills appear in the slash menu" (T11 step 3).
|
||||
- **T17 navigation fix.** Add a `/new` navigation primitive to
|
||||
`claudeai.ts`'s `CodeTab` so `openFolderPicker` works on a fresh
|
||||
project route. Verify T17 reaches the dialog mock fired assertion.
|
||||
|
||||
If precondition 3 or 4 holds:
|
||||
|
||||
- **Defensive page-object refactor.** Re-snapshot the AX tree at the
|
||||
Customize panel and Plugin browser modal, refresh case-doc
|
||||
inventory anchors, migrate any decayed selectors.
|
||||
|
||||
### Termination signal interpretation
|
||||
|
||||
If session 17 is triggered without any precondition met, the right
|
||||
move is the same as session 16's STOP recommendation: write a one-
|
||||
paragraph "preconditions not met, no work shipped" plan-doc update
|
||||
and terminate. Don't burn a session on documentation-only output.
|
||||
|
||||
### Constraints to respect (unchanged from sessions 1-16)
|
||||
|
||||
- Use `seedFromHost: true` for any auth-required spec — never
|
||||
`CLAUDE_TEST_USE_HOST_CONFIG=1` / `isolation: null` (legacy shape
|
||||
removed in session 15).
|
||||
- eipc handlers register on `webContents.ipc._invokeHandlers`, NOT
|
||||
global `ipcMain._invokeHandlers`. Use `lib/eipc.ts`.
|
||||
- For arg validator schema-rev: smoke-test first, fall back to
|
||||
bundle-grep on the rejection literal.
|
||||
- For AX-tree consumers: use `lib/ax.ts` (`snapshotAx` /
|
||||
`waitForAxNode` / `waitForAxNodes`).
|
||||
- For call-site migrations to `waitForAxNode`: keep per-spec retry
|
||||
budgets matching existing tuning.
|
||||
- `lib/input.ts` is X11-only. `lib/input-niri.ts` is Niri-only. CDP
|
||||
auth gate is alive (runtime SIGUSR1 attach, never Playwright
|
||||
`_electron.launch()`). BrowserWindow Proxy gotcha — use
|
||||
`webContents.getAllWebContents()`. `skipUnlessRow()` always first.
|
||||
- No fixed sleeps. `retryUntil` from `lib/retry.ts`, Playwright
|
||||
auto-wait, or `waitForAxNode` from `lib/ax.ts`.
|
||||
- Diagnostics on every run via `testInfo.attach()`. Tag with
|
||||
`severity:` and `surface:` annotations.
|
||||
- Tabs in TS, ~80-char wrap.
|
||||
- Don't break existing runners. H01-H05 are the canaries.
|
||||
- `npm run typecheck` must stay clean.
|
||||
- Don't run destructive Tier 3 write-side tests.
|
||||
|
||||
### Authoritative reference
|
||||
|
||||
Read these in order before fanning out:
|
||||
|
||||
- [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
|
||||
— tier classification + status section. Read **session 15**, then
|
||||
**session 14**, **session 13**, **session 12**, **session 11**,
|
||||
**session 10**, **session 9**, **session 8**, **session 7**,
|
||||
**session 6**, **session 5**, **session 4**, **session 3**,
|
||||
**session 2**, then **session 1** "Status (post-execution)"
|
||||
sub-sections. The Tier-3 list (search for "## Tier 3") is the
|
||||
candidate pool for any further reframes.
|
||||
— tier classification + status sections.
|
||||
- [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
|
||||
— runner conventions, the now-74-spec inventory, primitives in
|
||||
`lib/`, isolation defaults (T17 now seedFromHost per session 15),
|
||||
the CDP-gate workaround, the eipc note, and `lib/ax.ts` substrate.
|
||||
— runner conventions, the 74-spec inventory, primitives in
|
||||
`lib/`, isolation defaults.
|
||||
- [`docs/testing/cases/README.md`](cases/README.md) — case-doc
|
||||
structure and the four anchor scopes.
|
||||
- [`tools/test-harness/src/lib/`](../../tools/test-harness/src/lib/)
|
||||
— the existing primitives. `lib/ax.ts` surface is `snapshotAx` /
|
||||
`waitForAxNode` / `waitForAxNodes` plus re-exports. The session 8
|
||||
eipc surface (`getEipcChannels` / `findEipcChannel` /
|
||||
`findEipcChannels` / `waitForEipcChannel` /
|
||||
`waitForEipcChannels` / `invokeEipcChannel` on `lib/eipc.ts`) is
|
||||
unchanged.
|
||||
- [`tools/test-harness/eipc-registry-probe.ts`](../../tools/test-harness/eipc-registry-probe.ts)
|
||||
— the session 7 read-only registry probe. Re-run against an
|
||||
auth-bearing debugger-attached Claude (`Developer → Enable Main
|
||||
Process Debugger` from the menu, signed-in) to capture the
|
||||
current registry shape.
|
||||
— the existing primitives.
|
||||
- [`tools/test-harness/src/runners/`](../../tools/test-harness/src/runners/)
|
||||
— every existing spec is a template. Notable session 15
|
||||
candidates for follow-up:
|
||||
- `T17_folder_picker.spec.ts` — newly migrated to seedFromHost.
|
||||
Run to verify the 60s timeout is gone. If T17 now passes, the
|
||||
structural fix shipped session 15 is verified.
|
||||
- Schema-rev for `listRemotePluginsPage` / `listSkillFiles` —
|
||||
rejection literals can be bundle-grepped without auth, and the
|
||||
validator runs auth-independent if /login state lets us
|
||||
invoke through the renderer-side wrapper. Session 12 found
|
||||
`listRemotePluginsPage` needs `limit: number` at position 0
|
||||
and `listSkillFiles` needs both `pluginId` and `skillName`.
|
||||
- [`docs/testing/cases/*.md`](cases/) — the spec each runner
|
||||
asserts. The **Code anchors:** field tells you exactly where
|
||||
upstream implements the feature.
|
||||
— every existing spec is a template.
|
||||
|
||||
### Tests in scope this session
|
||||
|
||||
**Realistic ceiling: ~1 verification run OR ~1 schema-rev investigation
|
||||
OR a "stop the orchestration" recommendation.** Sessions 9-12 each
|
||||
landed 1-2 specs; session 13 landed only a primitive (debugger
|
||||
blocked); session 14 landed only a migration (debugger blocked);
|
||||
session 15 landed only a structural fix (debugger soft-blocked).
|
||||
Coverage at 74/76 means the test budget naturally shifts toward
|
||||
verification, low-stakes investigation, or the orchestration
|
||||
termination decision.
|
||||
|
||||
**Phase 0 MUST check the debugger-attachment quality, not just port
|
||||
status.** Run `ss -tln 2>/dev/null | grep ':9229'` for port. If open,
|
||||
also run an `evalInMain` probe to enumerate webContents URLs — if no
|
||||
URL is `https://claude.ai/<not /login>`, treat as soft-blocked for
|
||||
auth-required categories. Probe shape (kept inline; delete after):
|
||||
|
||||
```ts
|
||||
import { InspectorClient } from './src/lib/inspector.js';
|
||||
const client = await InspectorClient.connect(9229);
|
||||
const wcs = await client.evalInMain<unknown>(`
|
||||
const { webContents } = process.mainModule.require('electron');
|
||||
return webContents.getAllWebContents().map((w) => ({
|
||||
id: w.id, url: w.getURL(), title: w.getTitle(),
|
||||
}));
|
||||
`);
|
||||
console.log(wcs); client.close();
|
||||
```
|
||||
|
||||
If every URL is `/login` or `find_in_page` or `main_window/index.html`,
|
||||
the debugger is attached to a leaked test isolation, not the user's
|
||||
Claude. Categories A and most of B are blocked. Category C may still
|
||||
be tractable since validators run auth-independent — try the schema-
|
||||
rev probe against the /login wrapper.
|
||||
|
||||
#### **PRIORITY: Verify T17's session 15 seedFromHost migration
|
||||
actually resolves the 60s timeout.** Session 15 didn't run T17 because
|
||||
the dev box had ambiguous Electron processes (some leaked test
|
||||
isolations, possibly the user's real Claude — `ps` couldn't
|
||||
disambiguate cleanly). Session 16's first action:
|
||||
|
||||
1. Check `pgrep -af "ozone-platform=x11.*app.asar"` and
|
||||
`ps -o pid,user-data-dir` to identify whether any real-Claude
|
||||
process is running (real Claude has a non-`/tmp/claude-test-*`
|
||||
user-data-dir, typically nothing or `~/.config/Claude`).
|
||||
2. If only test cruft is running, run T17 (`npx playwright test
|
||||
T17 --reporter=list`). The test will kill those leaked
|
||||
processes via `seedFromHost`'s host-Claude-kill semantics —
|
||||
that's actually a desirable cleanup side effect.
|
||||
3. If a real Claude IS running, **flag clearly in the report
|
||||
before running**, then run T17. The user accepted the
|
||||
`seedFromHost` kill side effect when authorising autonomous
|
||||
orchestration; just be transparent about it.
|
||||
4. Capture pass/skip/fail. Update the matrix coverage doc if
|
||||
T17 now passes.
|
||||
5. If T17 still fails, classify the new failure mode (is it now
|
||||
AX-polling? Folder picker chain? Mock not installing?) and
|
||||
decide whether to fix or defer.
|
||||
|
||||
This is **strictly higher-impact than session 14/15's
|
||||
spec-implementation work** because it produces a concrete
|
||||
pass/fail data point that resolves a 2-session-old hypothesis.
|
||||
Doesn't need the debugger.
|
||||
|
||||
Three categories — pick the verification run as the main bet, treat
|
||||
the others as fallback if the main bet hits an early blocker:
|
||||
|
||||
| # | Tests | Source | Notes |
|
||||
|---|---|---|---|
|
||||
| **D-verify** T17 verification run (PRIORITY) | T17 | session 15 migration | Run T17 against the dev box. If pass, log it. If fail, classify the new failure mode. **Side effect: kills any running Claude (the user's, or leaked test cruft). Flag in the report.** Doesn't need the debugger. |
|
||||
| **C** Schema-rev for `listRemotePluginsPage` / `listSkillFiles` | Bundle grep | session 9 schema-rev pattern | Both methods rejected every smoke-tested arg shape during session 12's investigation. `listRemotePluginsPage` needs `limit: number` at position 0 (rejection: `Argument "limit" at position 0 ...`); `listSkillFiles` needs both `pluginId` and `skillName` (rejection: `Argument "skillName" at position 1 ...`). Bundle-grep on the rejection literals → resolve the schema → ship a narrowly-scoped Tier 2 invocation if it unblocks a case-doc claim. **Tractable against a /login isolation since validators run auth-independent.** |
|
||||
| **STOP** Orchestrator stop recommendation | n/a | session 15 productivity signal | Coverage at 97%, three consecutive non-coverage sessions, remaining categories soft- or hard-blocked. If D-verify and C both produce nothing tractable, formally recommend the orchestrator stop. Documentation-only sessions are still acceptable per the followup termination criteria, but consecutive ones with no improvement signal are noise. |
|
||||
|
||||
#### Category D-verify — T17 verification run
|
||||
|
||||
The plan: run the post-session-15 T17 against the dev box and capture
|
||||
the result. Pass = the structural fix landed correctly. Fail = the
|
||||
hypothesis was incomplete; classify and decide.
|
||||
|
||||
1. **Disambiguate running Claude processes.** `pgrep -af
|
||||
"ozone-platform=x11.*app.asar"`; for each, `cat
|
||||
/proc/<pid>/cmdline | tr '\0' '\n' | grep user-data-dir` (or
|
||||
inspect via `ps` cmdline). If only `/tmp/claude-test-*`
|
||||
user-data-dirs, no real Claude is running.
|
||||
2. **Run T17.** `cd tools/test-harness && npx playwright test
|
||||
T17_folder_picker --reporter=list 2>&1 | tee
|
||||
/tmp/t17-session16.log`.
|
||||
3. **Classify.**
|
||||
- Pass: structural fix verified. Update plan-doc / matrix.
|
||||
- Skip with "seedFromHost unavailable": means host has no
|
||||
`~/.config/Claude/Local State`. Should be rare on the dev
|
||||
box but possible if config was wiped between sessions.
|
||||
- Skip with "seeded auth did not reach post-login URL":
|
||||
auth was seeded but stale. User needs to re-sign-in
|
||||
manually. Don't try to reseed automatically.
|
||||
- Fail with NEW failure mode: classify the failure (AX
|
||||
click? openFolderPicker chain? dialog mock?). If it's
|
||||
now in `openPill` / `clickMenuItem`, sessions 14/15's
|
||||
speculation has finally hit; ship the AX migration.
|
||||
Otherwise document and defer.
|
||||
4. **Don't restructure T17's body** unless step 3 surfaces a
|
||||
real new bug. Keep changes scoped to whatever the verification
|
||||
surfaces.
|
||||
|
||||
Doesn't need the debugger.
|
||||
|
||||
#### Category C — Schema-rev for rejecting read-sides
|
||||
|
||||
The plan: resolve the validator schema for `listRemotePluginsPage` /
|
||||
`listSkillFiles` via bundle grep, ship invocations if either unblocks
|
||||
a case-doc claim. Tractable against a /login isolation since
|
||||
validators run auth-independent.
|
||||
|
||||
1. **Grep on the rejection literal** in the bundled `index.js`.
|
||||
Validator block sits ~50-200 chars before the throw site (session
|
||||
9 finding). Read ~2KB around the hit to surface the full schema.
|
||||
2. **Smoke-test the recovered schema** against the user's debugger-
|
||||
attached running Claude (or, if auth-soft-blocked as in session 15,
|
||||
against the /login isolation — validators run regardless of auth).
|
||||
3. **Connect the resolved invocation to a case-doc claim.**
|
||||
4. **Ship a Tier 2 invocation** if a case-doc claim is unblocked.
|
||||
|
||||
Auth-independent for the validator; auth-bearing for any handler that
|
||||
actually returns plugin / skill data. If the validator resolves but
|
||||
the handler fails on auth, document the schema in plan-doc as a
|
||||
deferred reframe and move on.
|
||||
|
||||
#### STOP recommendation
|
||||
|
||||
If D-verify resolves cleanly (pass or stable skip) and C produces no
|
||||
shippable spec after the schema-rev investigation, the productivity
|
||||
signal for further sessions is squarely "documentation-only with no
|
||||
clear next-step deliverable." The orchestrator should stop. State
|
||||
this plainly in the final report; don't keep cycling.
|
||||
|
||||
### Constraints to respect (don't violate)
|
||||
|
||||
These are unchanged from sessions 1-15 and still load-bearing:
|
||||
|
||||
- **Default isolation** unless the spec needs otherwise. Use
|
||||
`seedFromHost: true` for any test that depends on authenticated
|
||||
renderer state — never assume default isolation gets past
|
||||
`/login`. T07/T11_runtime/T16/T17/T19/T20/T21/T26/T22b/T27/T31b/T33b/T33c/T35b/T37b/T38b
|
||||
are the templates. **T17 was migrated to this shape in session 15.**
|
||||
- **eipc handlers register on `webContents.ipc._invokeHandlers`,
|
||||
NOT global `ipcMain._invokeHandlers`.** Session 7 finding. Use
|
||||
`lib/eipc.ts` rather than rolling a new walker.
|
||||
- **eipc invocation goes through the renderer-side wrapper at
|
||||
`window['claude.<scope>'].<Iface>.<method>`.** Session 8 finding.
|
||||
Use `lib/eipc.ts`'s `invokeEipcChannel` rather than rolling
|
||||
main-side direct calls.
|
||||
- **For arg validator schema-rev: try smoke-test first, then grep
|
||||
the rejection message literal.** Session 9 finding.
|
||||
- **For AX-tree consumers: use `lib/ax.ts`.** Session 13 finding.
|
||||
`snapshotAx` for one-shot reads, `waitForAxNode` /
|
||||
`waitForAxNodes` for predicate-based polling.
|
||||
- **For call-site migrations to `waitForAxNode`: keep the per-spec
|
||||
retry budgets matching the existing tuning.** Session 14
|
||||
finding. Migration is shape-only EXCEPT when the call-site has
|
||||
NO retry at all — adding a budget is the bug-fix the migration
|
||||
delivers.
|
||||
- **For test specs that depend on host auth: use `seedFromHost:
|
||||
true`.** Session 15 finding. The legacy `CLAUDE_TEST_USE_HOST_CONFIG=1`
|
||||
/ `isolation: null` shape collides with Playwright's 60s spec
|
||||
timeout when the env var isn't set; `seedFromHost` gives a clean
|
||||
skip-or-pass shape. T17 was the last spec on the legacy shape.
|
||||
- **`lib/input.ts` is X11-only.** Strict gate.
|
||||
- **`lib/input-niri.ts` is Niri-only.** Strict gate.
|
||||
- **CDP auth gate is alive** — runtime SIGUSR1 attach via
|
||||
`app.attachInspector()`, never Playwright's `_electron.launch()`
|
||||
or `chromium.connectOverCDP()`.
|
||||
- **BrowserWindow Proxy gotcha** — use
|
||||
`webContents.getAllWebContents()` not
|
||||
`BrowserWindow.getAllWindows()`.
|
||||
- **`skipUnlessRow()` always first.**
|
||||
- **No fixed sleeps.** `retryUntil` from `lib/retry.ts`, or
|
||||
Playwright auto-wait, or `waitForAxNode` from `lib/ax.ts`.
|
||||
- **Diagnostics on every run.** `testInfo.attach()` the artefacts.
|
||||
- **Tag with annotations.** `severity:` and `surface:` on every
|
||||
test so JUnit carries them through to matrix-regen.
|
||||
- **Tabs in TS, ~80-char wrap as the existing files do.**
|
||||
- **Don't break existing runners.** `npm run typecheck` must stay
|
||||
clean. H01-H05 are the canaries; `npm test` must still pass them
|
||||
after every commit. Note that T07 / S25 / S29-S31 / S04 etc.
|
||||
may be pre-existing-flaky on KDE-W — they're NOT canaries;
|
||||
baseline failures don't block work.
|
||||
- **Always grep the installed asar** to verify a fingerprint
|
||||
string is present.
|
||||
|
||||
### Phases
|
||||
|
||||
#### Phase 0 — calibration
|
||||
### Phase 0 — calibration (mandatory before fanning out)
|
||||
|
||||
1. `cd tools/test-harness && npm run typecheck` — should pass.
|
||||
2. **Check debugger ATTACHMENT QUALITY (not just port).** First
|
||||
`ss -tln 2>/dev/null | grep ':9229'`. If port open, also probe
|
||||
webContents via `evalInMain` (see "Big new findings" §3 for
|
||||
the probe shape). If every URL is `/login` /
|
||||
`find_in_page` / `main_window`, treat as soft-blocked.
|
||||
3. **Disambiguate running Claude processes.** Required before any
|
||||
`seedFromHost` spec. `pgrep -af "ozone-platform=x11.*app.asar"`
|
||||
+ cmdline inspection for user-data-dir.
|
||||
4. Read the plan doc's "Status (post-execution)" session 15 section,
|
||||
then read T17's session-15 form and the seedFromHost convention.
|
||||
5. Pick the main bet:
|
||||
- **D-verify** (PRIORITY): run T17, classify the result.
|
||||
- **C**: bundle grep on rejection literals, schema-rev,
|
||||
smoke-test the resolved shape against the /login isolation.
|
||||
- **STOP**: if both above produce nothing tractable, recommend
|
||||
stopping the orchestration.
|
||||
2. Check debugger ATTACHMENT QUALITY (not just port). `ss -tln |
|
||||
grep ':9229'`. If port open, probe webContents via `evalInMain`:
|
||||
|
||||
If Phase 0 surfaces a problem (typecheck failing, primitives unclear,
|
||||
the chosen Category's prerequisites don't hold), stop and report.
|
||||
Don't fan out.
|
||||
```ts
|
||||
import { InspectorClient } from './src/lib/inspector.js';
|
||||
const client = await InspectorClient.connect(9229);
|
||||
const wcs = await client.evalInMain<unknown>(`
|
||||
const { webContents } = process.mainModule.require('electron');
|
||||
return webContents.getAllWebContents().map((w) => ({
|
||||
id: w.id, url: w.getURL(), title: w.getTitle(),
|
||||
}));
|
||||
`);
|
||||
console.log(wcs); client.close();
|
||||
```
|
||||
|
||||
#### Phase 1 — fan-out batch
|
||||
|
||||
For Category D-verify (T17 run):
|
||||
- Single subagent (or do directly — it's a single-command run +
|
||||
trace inspection) runs T17 and classifies. Verify by checking
|
||||
pass/skip/fail and any new failure-mode trace.
|
||||
|
||||
For Category C (schema-rev):
|
||||
- Single subagent does bundle-grep on the rejection literals,
|
||||
surfaces the validator schemas, smoke-tests the recovered shapes
|
||||
against the user's debugger-attached running Claude (or /login
|
||||
isolation if soft-blocked).
|
||||
|
||||
Cap at ~1 spec OR ~1 verification + 1 schema-rev — same scope as
|
||||
sessions 9-15.
|
||||
|
||||
#### Per-subagent prompt shape
|
||||
|
||||
```
|
||||
You're implementing ONE [verification run | primitive migration |
|
||||
investigation] for <TARGET>.
|
||||
|
||||
Read in order:
|
||||
- docs/testing/cases/<FILE>.md (focus on <TARGET>'s Code anchors)
|
||||
- tools/test-harness/README.md (conventions; status section names
|
||||
the most-recent-template that fits)
|
||||
- tools/test-harness/src/runners/<closest-template>.spec.ts
|
||||
- tools/test-harness/src/lib/ (the primitives you'll reuse —
|
||||
including session 13's `lib/ax.ts` and session 15's seedFromHost
|
||||
T17 migration)
|
||||
- CLAUDE.md (project conventions)
|
||||
|
||||
[per-task specifics: pattern (verification run / mock-then-call /
|
||||
asar fingerprint / shared isolation / new-primitive-build /
|
||||
investigation / call-site migration), assertion shape, skip rules,
|
||||
key constraint warnings]
|
||||
|
||||
Constraints:
|
||||
- Tabs, ~80-char wrap.
|
||||
- Use lib/* primitives; don't reinvent.
|
||||
- testInfo.attach() the diagnostics from the spec's "Diagnostics
|
||||
on failure" block.
|
||||
- Tag with severity + surface annotations.
|
||||
- No fixed sleeps. retryUntil, Playwright auto-wait, or
|
||||
waitForAxNode.
|
||||
- npm run typecheck must stay clean after your edits.
|
||||
- Don't commit. The user reviews and commits.
|
||||
|
||||
If the target isn't reasonable to implement (anchors don't resolve
|
||||
to anything assertable, the test depends on state you can't
|
||||
construct, the existing primitives don't cover the surface), DO
|
||||
NOT write a stub. Report under Open questions and stop.
|
||||
|
||||
Report shape (~150 words):
|
||||
## <TARGET> [verification | primitive | investigation | migration]
|
||||
|
||||
- File written: tools/test-harness/src/runners/<filename>.spec.ts
|
||||
[or lib/<newfile>.ts or modified lib/<existing>.ts]
|
||||
- Layer: file probe | argv probe | L1 | L2 (xprop) | L2 (DBus) |
|
||||
pgrep | new-primitive | investigation | migration | verification
|
||||
- Assertion shape (or migration shape): <one sentence>
|
||||
- Skip rules: <which rows + why>
|
||||
- Verification path: <typecheck + run result>
|
||||
- Open questions: <caveats>
|
||||
```
|
||||
|
||||
#### Phase 2 — synthesis
|
||||
|
||||
After fan-out returns:
|
||||
|
||||
1. `cd tools/test-harness && npm run typecheck` — must stay clean.
|
||||
2. Run the new / migrated runners against KDE-W (the dev box) — but
|
||||
flag the user first if any are destructive (seedFromHost kills
|
||||
running Claude). Capture pass/skip/fail per spec for the matrix.
|
||||
3. Update [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
|
||||
"Status (post-execution)" section to reflect newly-shipped
|
||||
specs / primitive migrations and any reclassifications.
|
||||
4. Update [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
|
||||
inventory table.
|
||||
5. Write a final report listing:
|
||||
- Specs landed / migrations completed (pass / skip / needs-tuning per row)
|
||||
- Primitives landed (with API shape)
|
||||
- Specs deferred (with the per-test rationale)
|
||||
- Specs reclassified (Tier 3 → Tier 2, Tier 2 → Tier 1, etc.)
|
||||
- Updated coverage stat (was 74/76 = 97%, now N/76 = M%)
|
||||
6. Commit and push to `docs/compat-matrix` (the orchestration
|
||||
directive at the top of the followup supersedes "don't commit").
|
||||
7. Rotate this prompt: rewrite
|
||||
`docs/testing/runner-implementation-followup-prompt.md` for
|
||||
the NEXT session's deferred items.
|
||||
|
||||
### Self-correction loop
|
||||
|
||||
Same as sessions 1-15:
|
||||
|
||||
1. Subagent typecheck failure → re-spawn with explicit fix
|
||||
instruction.
|
||||
2. Subagent claims a runner / migration exists but `git status`
|
||||
shows no new file → re-spawn with explicit "use the Write tool"
|
||||
instruction.
|
||||
3. Two subagents wrote runners that share a primitive but with
|
||||
different shapes → factor into `lib/<topic>.ts` BEFORE shipping.
|
||||
4. Spec passes locally but the assertion is actually trivial → re-
|
||||
examine the assertion shape.
|
||||
5. Migration breaks an existing spec → roll back the migration; the
|
||||
per-spec retry budget was load-bearing and the primitive
|
||||
defaults didn't match.
|
||||
6. **Carry-over from sessions 5-15:** If the chosen Category's
|
||||
investigation doesn't resolve / requires schema-rev that exceeds
|
||||
budget after 2-3 approaches, STOP. Don't keep digging — pivot
|
||||
to a fallback Category. Document what was tried.
|
||||
7. **Carry-over from session 10:** If a registration probe surfaces
|
||||
"registered but uninvocable", document and defer rather than
|
||||
building the main-side fallback speculatively.
|
||||
|
||||
Cap re-spawns at 2 per file. Past that, mark as needing human
|
||||
review and move on.
|
||||
|
||||
### Termination conditions
|
||||
|
||||
Stop and write the final report when one of:
|
||||
|
||||
1. **Main-bet Category target landed and typecheck-clean.** Write
|
||||
coverage update, stop.
|
||||
2. **Hit re-spawn cap on 2+ tasks.** Stop, write up which are
|
||||
blocked.
|
||||
3. **Discovered a primitive gap that breaks 5+ Tier 2/Tier 3
|
||||
tests.** Stop, propose where the new primitive should live in
|
||||
`lib/`. Future session adds the primitive first, then resumes.
|
||||
4. **Session budget hits ~1 verification + 1 schema-rev landing.**
|
||||
Stop, synthesize, leave the rest for the next session.
|
||||
5. **All categories blocked / unproductive after 2-3 attempts
|
||||
each.** Document the findings as plan-doc additions, **and
|
||||
recommend the orchestrator stop the campaign** — coverage at
|
||||
97%, three+ consecutive non-coverage sessions, dimming
|
||||
productivity signal.
|
||||
|
||||
### What you should NOT do
|
||||
|
||||
- **Don't try to land D-verify + C in one batch.** Pick D-verify
|
||||
first; if that resolves cleanly, take C as a stretch goal.
|
||||
- **Don't ship stubs.** If a runner can't actually assert what the
|
||||
spec says, mark it as Tier 3 / blocked / primitive-gap and
|
||||
don't write a placeholder.
|
||||
- **Don't break existing runners.** H01-H05 are the canaries.
|
||||
- **Don't restructure `lib/`** beyond targeted additions.
|
||||
Premature abstractions are wrong abstractions.
|
||||
- **Don't run destructive Tier 3 tests** that write to the user's
|
||||
real claude.ai account.
|
||||
- **Don't introspect `ipcMain._invokeHandlers` for `claude.web`
|
||||
eipc channels.** Use `lib/eipc.ts`.
|
||||
- **Don't call `invokeEipcChannel` for write-side handlers.**
|
||||
- **Don't bolt other compositors into `lib/input-niri.ts`.**
|
||||
- **Don't bolt Wayland into `lib/input.ts`.**
|
||||
- **Don't speculate on a `lib/input-wayland.ts` dispatcher.**
|
||||
- **Don't preemptively build `CodeTab.activateTopTab()` /
|
||||
`startNewSession()`.**
|
||||
- **Don't add a main-side `invokeEipcChannel` fallback
|
||||
speculatively.**
|
||||
- **Don't speculate on a Launch event-subscription primitive.**
|
||||
- **Don't extract T07's CSS-querySelector poll into `lib/ax.ts`.**
|
||||
That's a different abstraction (DOM, not AX). Wait for a second
|
||||
CSS-poll consumer before extracting.
|
||||
- **Don't add a `waitForRenderedSurface(client, surfaceKey)`
|
||||
registry to `lib/ax.ts`.** Session 13 deliberately deferred
|
||||
this — wait for a third consumer with a specific named surface.
|
||||
- **Don't migrate `openPill` / `clickMenuItem` to `waitForAxNode`
|
||||
speculatively.** Session 15 confirmed T17's flake didn't need
|
||||
it; without a third consumer signal, it's premature optimisation.
|
||||
- **Don't reach into `explore/walker.ts` for AX types/helpers.**
|
||||
`lib/ax.ts` re-exports — use those.
|
||||
- **Don't implement the #569 power-inhibit patch in this
|
||||
session.** That's a separate workstream.
|
||||
- **Don't keep cycling on documentation-only sessions.** If
|
||||
D-verify and C both turn up empty, formally recommend the
|
||||
orchestrator stop the campaign rather than burning another
|
||||
session of compute on marginal output.
|
||||
|
||||
### Final report format
|
||||
|
||||
```markdown
|
||||
## Runner implementation summary (session 16)
|
||||
|
||||
- Main-bet category: D-verify | C | STOP
|
||||
- Specs landed: N
|
||||
- Migrations completed: N
|
||||
- Primitives landed: N
|
||||
- Verifications run: N
|
||||
- Reclassified mid-flight: N (with reasons)
|
||||
- Coverage: was 74/76 (97%), now <NEW>/76 (<PCT>%)
|
||||
- Typecheck: clean | <errors>
|
||||
- KDE-W test run: <pass/skip/fail counts>
|
||||
|
||||
## Per-spec breakdown
|
||||
|
||||
| Cat | Test ID | File | Assertion shape | Status |
|
||||
|---|---|---|---|---|
|
||||
| D-verify | T17 | T17_folder_picker.spec.ts | … | ✓ pass / skip / fail |
|
||||
| ... |
|
||||
|
||||
## Notable findings
|
||||
- ...
|
||||
|
||||
## Open questions
|
||||
- ...
|
||||
|
||||
## Stop recommendation
|
||||
- Yes / no, with rationale.
|
||||
|
||||
## Files touched
|
||||
git status output.
|
||||
|
||||
## Diff summary
|
||||
git diff --stat
|
||||
```
|
||||
If every URL is `/login` / `find_in_page` / `main_window`, treat
|
||||
as soft-blocked for auth-required investigations.
|
||||
3. Disambiguate running Claude processes. `pgrep -af
|
||||
"ozone-platform=x11.*app.asar"`; for each, inspect cmdline for
|
||||
`user-data-dir`. Real Claude has
|
||||
`~/.config/Claude` (or no user-data-dir flag); leaked test
|
||||
isolations have `/tmp/claude-test-*`.
|
||||
4. **Verify at least one precondition for resuming the orchestration
|
||||
holds.** If none hold, write a "no preconditions met" plan-doc
|
||||
update and STOP. Don't fan out.
|
||||
|
||||
### Operational notes
|
||||
|
||||
- Subagents are launched in parallel via a single message with
|
||||
multiple Agent tool calls. Don't serialise.
|
||||
- Each subagent's Write calls land directly in the working tree.
|
||||
- The grounding probe (`tools/test-harness/grounding-probe.ts`)
|
||||
can help when implementing a runner that asserts runtime API
|
||||
state.
|
||||
- The eipc-registry probe (`tools/test-harness/eipc-registry-probe.ts`)
|
||||
is the dedicated tool for inspecting per-wc IPC handler state.
|
||||
Connects to a debugger-attached running Claude on port 9229.
|
||||
- For seedFromHost specs, the host MUST have a signed-in Claude
|
||||
Desktop. The primitive throws with a clear message if not.
|
||||
- For tests that touch the AX tree, **`lib/ax.ts`** is the shared
|
||||
substrate.
|
||||
- For mock-then-call: helpers live in `lib/electron-mocks.ts`.
|
||||
- For focus-shifting (X11 only): `lib/input.ts` exports
|
||||
`focusOtherWindow` + `spawnMarkerWindow`.
|
||||
- For Wayland-native focus-shifting (Niri only): `lib/input-niri.ts`.
|
||||
- For eipc registry walking: `lib/eipc.ts` exports
|
||||
`getEipcChannels` / `findEipcChannel` / `findEipcChannels` /
|
||||
`waitForEipcChannel` / `waitForEipcChannels`.
|
||||
- For eipc invocation: `lib/eipc.ts` exports `invokeEipcChannel`.
|
||||
Only call read-side suffixes; the primitive doesn't enforce a
|
||||
read-only allowlist.
|
||||
- **For arg validator schema-rev (sessions 9 / 11 / 12 findings):**
|
||||
smoke-test first, bundle-grep on rejection literal as fallback.
|
||||
- **For session-scoped Tier 2 reframes (session 10 finding):**
|
||||
`LocalSessions/getAll` foundational read-side surrogate.
|
||||
- **For Tier 2 reframes with case-doc-anchored read-side handlers
|
||||
(session 11 finding):** invoke directly. Mixed-shape OK.
|
||||
- **For Tier 2 reframes spanning two interfaces (session 12
|
||||
finding):** invoke a read-side from each impl object.
|
||||
- **For AX-tree polling (session 13 finding):** `lib/ax.ts`'s
|
||||
`waitForAxNode` / `waitForAxNodes` for predicate-based polling.
|
||||
- **For call-site migrations to `waitForAxNode` (session 14
|
||||
finding):** keep per-spec retry budgets matching the existing
|
||||
tuning.
|
||||
- **For auth-required spec migrations (session 15 finding):**
|
||||
use `seedFromHost: true`, NOT `CLAUDE_TEST_USE_HOST_CONFIG=1` /
|
||||
`isolation: null`. The legacy shape collides with Playwright's
|
||||
60s spec timeout.
|
||||
- **For asar fingerprints: ALWAYS grep the installed asar
|
||||
first.** Build-reference is beautified; the bundle is
|
||||
minified.
|
||||
- For the bundle-grep schema-rev pattern (sessions 9, 11, 12, 16
|
||||
precedents):
|
||||
|
||||
```bash
|
||||
cd tools/test-harness && node -e "
|
||||
const {extractFile} = require('@electron/asar');
|
||||
@@ -648,10 +218,21 @@ git diff --stat
|
||||
'.vite/build/index.js'
|
||||
);
|
||||
const s = buf.toString('utf8');
|
||||
for (const k of ['<your-needle>', '<another>']) {
|
||||
console.log(k, '->', s.split(k).length - 1);
|
||||
}
|
||||
const idx = s.indexOf('<rejection-literal>');
|
||||
console.log(s.slice(Math.max(0, idx - 1500), idx + 500));
|
||||
"
|
||||
```
|
||||
|
||||
Begin with Phase 0. Don't fan out until calibration succeeds.
|
||||
- For seedFromHost specs: host MUST have a signed-in Claude.
|
||||
`seedFromHost`'s host-claude-kill semantics will tear down any
|
||||
running Claude process — flag clearly in the report before
|
||||
invoking when the user's real Claude is running.
|
||||
|
||||
- For AX-tree polling: `lib/ax.ts`'s `waitForAxNode` /
|
||||
`waitForAxNodes` for predicate-based polling.
|
||||
|
||||
- The eipc-registry probe (`tools/test-harness/eipc-registry-probe.ts`)
|
||||
is the dedicated tool for inspecting per-wc IPC handler state.
|
||||
|
||||
Begin with Phase 0. Don't fan out until at least one of the
|
||||
preconditions for resuming the orchestration is verified to hold.
|
||||
|
||||
@@ -18,6 +18,140 @@ work begins.
|
||||
|
||||
## Status (post-execution)
|
||||
|
||||
**Shipped session 16 (verification + schema-rev investigation, no new spec):**
|
||||
T17's session-15 `seedFromHost` migration verified end-to-end against
|
||||
the dev box: the bare 60s Playwright timeout is GONE, `seedFromHost`
|
||||
clones the host's signed-in config, `waitForReady('userLoaded')`
|
||||
resolves to `https://claude.ai/epitaxy` (post-login), the dialog mock
|
||||
installs, and `CodeTab.activate({ timeout: 15_000 })` (session 14
|
||||
migration) succeeds. T17 reaches a NEW failure mode at the next chain
|
||||
step: `CodeTab.openFolderPicker: "Select folder…" pill did not open
|
||||
within 4s after Local was clicked` — the env-pill open + Local click
|
||||
both succeed, but the Select-folder pill doesn't render in the URL
|
||||
state we reach (`/epitaxy`, the user's workspace, NOT `/new`). Per the
|
||||
session-15 followup classification rules: this is NOT in `openPill` /
|
||||
`clickMenuItem`'s post-click loops (those work — the env pill opened
|
||||
and Local was found and clicked); the failure is one chain step later,
|
||||
likely renderer-state-dependent (the workspace route doesn't expose a
|
||||
local-folder picker the same way `/new` does). Don't migrate
|
||||
`openPill` / `clickMenuItem` speculatively — that's been the standing
|
||||
deferral since session 14. Document and defer the new failure mode.
|
||||
|
||||
Category C schema-rev (`listRemotePluginsPage` / `listSkillFiles`)
|
||||
**resolved** by bundle inspection of
|
||||
`/usr/lib/claude-desktop/node_modules/electron/dist/resources/app.asar`
|
||||
(extracted via `@electron/asar`):
|
||||
|
||||
- `CustomPlugins.listRemotePluginsPage(limit: number, offset: number)`
|
||||
— both positional, both numbers. Validator block sits at
|
||||
`'$eipc_message$_..._$_claude.web_$_CustomPlugins_$_listRemotePluginsPage'`,
|
||||
with explicit `typeof r!="number"` / `typeof n!="number"` checks
|
||||
preceding the throw. Result validator `VUi(s)`.
|
||||
- `LocalPlugins.listSkillFiles(pluginId: string, skillName: string,
|
||||
pluginContext?: opaque)` — two required strings + optional context
|
||||
arg validated by `sc(s)` (the same shared validator used elsewhere
|
||||
for plugin-context blobs). Result validator `bUi(o)`.
|
||||
|
||||
**No Tier 2 invocation shipped for either** because neither method
|
||||
connects to a case-doc claim:
|
||||
|
||||
- `listRemotePluginsPage` is NOT anchored in any case doc. T33 anchors
|
||||
`listMarketplaces` (`:71392`) and `listAvailablePlugins` (`:71534`)
|
||||
— both already covered by T33b/T33c — but `listRemotePluginsPage`
|
||||
is a separate read-side surface (paginated remote-plugin list) that
|
||||
the case docs don't claim. Shipping a probe just to exercise the
|
||||
validator with no assertion bound to a real-product behaviour would
|
||||
be a stub.
|
||||
- `listSkillFiles` is `LocalPlugins`-scoped and meaningful only with
|
||||
an installed plugin (T11 step 3: "verify its skills appear in the
|
||||
slash menu"). Reaching that requires the destructive Tier 3 install
|
||||
path, which the constraints explicitly forbid. The validator
|
||||
resolves auth-independent, but the underlying handler needs real
|
||||
account state.
|
||||
|
||||
Schemas captured in plan-doc as a deferred reframe so a future session
|
||||
with a real-account install fixture can ship the invocation.
|
||||
|
||||
Coverage stays at 74/76 (97%) — verification + investigation, no spec
|
||||
landed.
|
||||
|
||||
Two commits on `docs/compat-matrix` expected (the orchestration
|
||||
directive supersedes "the user reviews and commits" — autonomous
|
||||
commit + push at end of session):
|
||||
|
||||
- TBD — `test(harness): session 16 verify T17 seedFromHost fix +
|
||||
schema-rev for listRemotePluginsPage / listSkillFiles (no spec,
|
||||
coverage unchanged at 97%)` (no code change beyond the doc updates;
|
||||
T17 verification run + schema-rev bundle inspection captured in
|
||||
the plan-doc).
|
||||
- TBD — `docs(testing): session 16 plan/inventory + flag orchestrator
|
||||
STOP for session 17`.
|
||||
|
||||
Session 16 findings + reclassifications:
|
||||
|
||||
- **Session 15's structural T17 fix VERIFIED.** The pre-fix bare 60s
|
||||
timeout was real and is gone. `seedFromHost` clones host config,
|
||||
the renderer reaches a post-login URL, mocks install, and tab
|
||||
activation succeeds. Session 14's `activateTab` /
|
||||
`CodeTab.activate` AX migration also verified — `activate({
|
||||
timeout: 15_000 })` resolved on the FIRST run with no flake.
|
||||
- **T17's NEW failure mode classified as renderer-state, not AX.**
|
||||
Post-`selectLocal` the Select-folder pill never appeared; this is
|
||||
upstream of `openPill`'s click loop (the env pill opened, Local
|
||||
was clicked successfully). The trace shows the URL is
|
||||
`https://claude.ai/epitaxy` — the user's workspace route, not
|
||||
`/new`. The folder-picker UI may only render on `/new` (or a
|
||||
fresh project), not on a workspace already containing files.
|
||||
Future fix: navigate to `/new` post-userLoaded before invoking
|
||||
`openFolderPicker`. NOT shipped this session — needs a careful
|
||||
navigation primitive that doesn't break existing seedFromHost
|
||||
specs.
|
||||
- **`openPill` / `clickMenuItem` migration STILL parked.** Sessions
|
||||
14/15 speculated about migrating these; session 15 walked it back;
|
||||
session 16 confirms session 15's call. The new T17 failure is one
|
||||
chain step later, NOT in the post-click polling loops.
|
||||
- **Schema-rev cleanly resolved both deferred validators.** Session 9
|
||||
pattern (bundle-grep on the rejection literal) works as expected.
|
||||
No smoke-test was needed because the validator literal IS the
|
||||
schema source of truth (typeof checks are explicit in source).
|
||||
Smoke-test against a live debugger-attached Claude wasn't possible
|
||||
this session because T17's seedFromHost step killed the leaked
|
||||
isolations and tore down the debugger.
|
||||
- **No case-doc connection for either resolved schema.**
|
||||
`listRemotePluginsPage` is paginated remote-plugin enumeration
|
||||
(a separate surface from T33's `listMarketplaces` /
|
||||
`listAvailablePlugins` already covered). `listSkillFiles` needs
|
||||
real account state via a Tier 3 install. Both are documented for
|
||||
future revisit, neither shipped as a runner.
|
||||
- **Three Tier 4 blockers crystallised.** Sessions 13-16 collectively
|
||||
confirm the remaining un-runner'd specs all sit behind one of:
|
||||
(a) write-side state on a real claude.ai account (Tier 3
|
||||
destructive — explicitly forbidden); (b) renderer-state-dependent
|
||||
UI that the harness can't construct without account-side fixtures
|
||||
(T17's `/new` requirement); (c) auth-bearing debugger-attached
|
||||
Claude that exists only when a real signed-in app is running on
|
||||
the dev box (which the session-13 onwards sessions have been
|
||||
unable to keep alive across orchestration runs because seedFromHost
|
||||
kills it). At 74/76 (97%), the structural ceiling for the harness
|
||||
is reached; the remaining 2 specs need real-account write-side
|
||||
fixtures.
|
||||
|
||||
**ORCHESTRATION-LEVEL STOP RECOMMENDATION (session 16 final).**
|
||||
Sessions 13-16 produced: 1 primitive (`lib/ax.ts` — session 13), 1
|
||||
substantive AX migration (`activateTab` + `CodeTab.activate` —
|
||||
session 14), 1 structural fix (T17 seedFromHost — session 15), 1
|
||||
verification + 1 schema-rev investigation (session 16). NO coverage
|
||||
gain across 4 sessions. Coverage start 74/76 → end 74/76 (97%
|
||||
throughout). The structural ceiling is reached. Future sessions
|
||||
should be triggered manually — only when (a) the user has a real
|
||||
signed-in Claude they're willing to dedicate to a debugger-attached
|
||||
session, or (b) a new test-harness primitive opportunity surfaces
|
||||
from product changes (e.g. claude.ai renderer drift requiring
|
||||
refactoring, new IPC surfaces requiring registry walking). The
|
||||
autonomous orchestration is being stopped after session 16.
|
||||
|
||||
---
|
||||
|
||||
**Shipped session 15 (1 structural fix, no new spec, no AX migration):**
|
||||
T17 migrated from the legacy `CLAUDE_TEST_USE_HOST_CONFIG=1` /
|
||||
`isolation: null` auth path to the canonical `seedFromHost: true`
|
||||
|
||||
@@ -140,7 +140,14 @@ T27, T31b, T33b, T33c, T35b, T37b, T38b — session 15 migrated T17
|
||||
from the legacy `CLAUDE_TEST_USE_HOST_CONFIG=1` / `isolation: null`
|
||||
shape to `seedFromHost`, fixing a pre-existing 60s spec-timeout
|
||||
flake where the unauth'd default isolation polled `userLoaded` past
|
||||
Playwright's spec budget).
|
||||
Playwright's spec budget; session 16 verified the migration end-to-
|
||||
end — `seedFromHost` clones the host's signed-in config,
|
||||
`waitForReady('userLoaded')` resolves to a post-login URL, and the
|
||||
session-14 `CodeTab.activate({ timeout: 15_000 })` succeeds; T17
|
||||
now reaches a NEW failure mode at the next chain step
|
||||
(`openFolderPicker` after `selectLocal`, `Select folder…` pill
|
||||
doesn't render on `/epitaxy` workspace route — likely needs `/new`
|
||||
context, deferred for a future session).
|
||||
|
||||
Note on eipc channels: the `LocalSessions_$_*` and `CustomPlugins_$_*`
|
||||
channel names referenced in the case-doc Code anchors don't register
|
||||
|
||||
Reference in New Issue
Block a user