docs(testing): session 10 plan/inventory + rotate session 11 prompt

- Plan-doc Status: session 10 sub-section (T19/T20 + Launch finding +
  operon partial answer + LocalSessions read-side enumeration).
- README inventory: T19/T20 rows; eipc primitive consumer lists
  (`waitForEipcChannels` and `invokeEipcChannel`) extended with T19/T20.
- Followup-prompt: session 11 candidates — Category A (T21 dev server
  preview, now tractable since Launch registers 25 handlers; needs cwd
  schema-rev), Category B (T11 plugin install runtime upgrade via
  LocalPlugins read-sides), Category C (operon-mode navigation probe).

Co-Authored-By: Claude <claude@anthropic.com>
This commit is contained in:
aaddrick
2026-05-03 22:40:36 -04:00
parent cd1ad67f9a
commit 4c9a2ac951
3 changed files with 403 additions and 308 deletions

View File

@@ -1,104 +1,116 @@
# test-harness runner implementation — session 10 prompt
# test-harness runner implementation — session 11 prompt
This file is meant to be **copied verbatim into a fresh Claude Code
session** as the initial user message. Don't paraphrase it; the
orchestration depends on the exact directives below.
You're picking up after a runner-implementation session that landed 1
new spec (T33c) by way of reverse-engineering the
`CustomPlugins/listMarketplaces` arg validator. No primitive change.
Coverage 69/76 (91%) → 70/76 (92%). One commit on `docs/compat-matrix`
expected (SHA inserted after the test-harness commit lands — the user
reviews and commits at the end of every session):
You're picking up after a runner-implementation session that landed 2
new specs (T19 + T20) by way of registering the case-doc-anchored
write-side eipc surfaces plus invoking the foundational read-side
`LocalSessions/getAll` as the read-side surrogate. No primitive
change. Coverage 70/76 (92%) → 72/76 (95%). Two commits on
`docs/compat-matrix` expected (SHAs inserted after the test-harness
commit lands — the user reviews and commits at the end of every
session):
- TBD — `test(harness): session 9 T33c plugin browser invocation`
(Tier 2 invocation upgrade of T33b; schema-rev surfaced the
byte-identical hand-rolled validator on both `listMarketplaces` and
`listAvailablePlugins`; minimal valid arg is `[[]]` — empty
egressAllowedDomains, omit pluginContext; passes on KDE-W in 39.2s
with array shape on both invocations).
- TBD — `test(harness): session 10 T19/T20 runtime probes`
(Tier 2 reframes; multi-suffix `waitForEipcChannels` over the
case-doc-anchored write-side suffixes — `startShellPty` / `writeShellPty`
/ `stopShellPty` / `resizeShellPty` / `getShellPtyBuffer` for T19,
`readSessionFile` / `writeSessionFile` / `pickSessionFile` for T20
— plus single `invokeEipcChannel('LocalSessions_$_getAll', [])`
array-shape assertion as the foundational read-side surrogate;
passes on KDE-W in 23.4s + 27.7s sequential).
The plan doc at
[`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
captures the tier classification and execution-time reclassifications.
Its "Status (post-execution)" section is the source of truth for
what's done and what's deferred — read **session 9** first, then
**session 8**, then **session 7**, then **session 6**, then **session
5**, then **session 4**, then **session 3**, then **session 2**, then
**session 1** sub-sections.
what's done and what's deferred — read **session 10** first, then
**session 9**, then **session 8**, then **session 7**, then **session
6**, then **session 5**, then **session 4**, then **session 3**, then
**session 2**, then **session 1** sub-sections.
This session is a continuation, not a restart. Start by reading the
plan doc's status sections.
### Big new findings from session 9
### Big new findings from session 10
1. **Hand-rolled positional arg validators.** Both
`claude.web/CustomPlugins/listMarketplaces` and `listAvailablePlugins`
use byte-identical inline `Array.isArray(...) && r.every(a => typeof
a === "string")` checks for `egressAllowedDomains: string[]` (arg 0,
required) plus an optional `pluginContext` checked by a closed-over
`sc(...)` requiring `mode: string`. NOT Zod for args — the result
validator IS Zod, runs after the impl returns. Validator blocks at
bytes 5013601 / 5018821 in the bundled `index.js` (single-line
minified bundle, ~15 MB, byte offsets not line numbers). Minimal
valid arg: `args = [[]]`. The empty allow-list is the safety
property — if the underlying impl is the CLI-shelling variant, it
forwards as the spawned subprocess's permitted domains.
2. **Two impl variants exist.** Both methods have a CLI-shelling impl
(`runCommand(["plugin", ...], { timeout: 30s/60s, allowedDomains: A
})`) AND a native impl (reads `knownMarketplacesFile` /
`marketplacesDir` directly). Selection logic isn't called out in
the registered handler's closure source; both variants return the
same `Array<…>` shape on success. T33c's `Array.isArray(result) ===
true` assertion holds regardless of which is active. Test budget
bumped to 180s to accommodate worst-case sequential CLI timeouts.
3. **Validator rejection messages are the cheapest grep target.** When
`invokeEipcChannel` rejects with `Argument "<name>" at position N
... failed to pass validation`, the verbatim rejection string in
the inline validator block is the entry point — single grep on the
literal error message resolves to the exact validator location in
the bundle. Save this pattern for any future schema-rev session
where invocation fails with a structured rejection.
4. **Bundle grep + runtime closure inspection converged independently.**
Two parallel investigations (subagents read the bundle vs. read
`Function.prototype.toString` of the registered handler via the
debugger-attached running Claude on :9229) produced byte-identical
validator literals and the same minimal arg shape. High confidence
on the schema. Worth using the dual-approach pattern again when a
future schema-rev needs cross-checking — both paths are cheap and
the false-positive rate goes to zero when they agree.
5. **`mainView.js` exposes 9 wrapper namespaces but only 5 currently
have registry-confirmed handlers** on the claude.ai webContents.
Carryover from session 8: `claude.operon` exposes 22 interfaces in
the renderer wrapper but session 7's registry walk on `/epitaxy`
and `/new` saw zero operon handlers registered. Either operon
handlers register lazily on operon-mode entry, or the wrapper is
exposed even when the handler isn't yet registered (in which case
`invokeEipcChannel` would fail with "no handler registered with
suffix"). Same uncertainty applies to `claude.web/Launch/*`
(relevant for T21 dev server preview): wrapper present, registry
un-confirmed. Worth a one-liner probe before any operon-scope or
Launch-scope spec lands.
1. **`claude.web/Launch` IS registered on claude.ai with 25 handlers.**
Overturns session 7's per-interface map (which captured /epitaxy
with cowork loaded but didn't list Launch). Session 10's registry
probe re-run on /epitaxy with an active session saw all 25:
`getLogs`, `stopServer`, `showPreview`, `hidePreview`,
`startFromConfig`, `getConfiguredServices`, `getAutoVerify`,
`setAutoVerify`, `deployPreview`, `destroyPreview`, `pickHtmlFile`,
`loadHtmlPreview`, `goBack`, `goForward`, `refreshPreview`,
`navigatePreview`, `getPreviewUrl`, `setPreviewColorScheme`,
`setPreviewViewport`, `clearPreviewViewport`,
`capturePreviewScreenshot`, `suggestDeployName`, `unpublishDeploy`,
`toggleSelectionMode`, `activeServers_$store$_getState`. T21 is now
tractable as a Tier 2 reframe.
2. **Launch invocation is `cwd`-gated.** Smoke-test of
`Launch/getConfiguredServices` and `Launch/getAutoVerify` rejected
with `Argument "cwd" at position 0 to method "<name>" in interface
"Launch" failed to pass validation`. Schema-rev via the rejection-
message grep pattern (session 9 finding) — the validator block sits
~50-200 chars before the throw site in the bundled `index.js`. T21
ships once the cwd format is recovered.
3. **`claude.operon/OperonBootstrap.ensure` registers eagerly on
claude.ai** (1 handler). Partial answer to session 8's open
question. The other 21 wrapper-exposed operon interfaces remain
registry-unconfirmed; they likely lazy-register on operon-mode
entry. Worth a follow-up navigation probe — operon-mode URL form
TBD (search `claude.ai/...` paths in the bundle for `operon`-keyed
routes).
4. **`LocalSessions/getAll` is the foundational read-side surrogate
for any session-scoped Tier 2 reframe.** Pattern: `args = []`,
returns `Array<Session>`, the case-doc connection is "this surface
binds to a LocalSession; getAll proves the LocalSessions impl
object is reachable through the renderer wrapper". T19 (terminal
binds to session) and T20 (file pane edits session-bound files)
both ship with this. Reuse for any future LocalSessions-scoped
case-doc test where the case-doc anchor is write-side.
5. **Smoke-test enumeration of LocalSessions read-sides.** The
following all invoke cleanly with `args = []`:
- `getAll`, `getInstalledEditors`, `getDetectedProjects`,
`isVSCodeInstalled`, `getSSHConfigs`, `getTrustedSSHHosts`,
`getDefaultEffort`, `getSupportedCommands`
These DO require args (rejected on smoke-test):
- `getDefaultPermissionMode` rejects on `cwd` arg
- `getSSHSupportedCommands` rejects on `config` arg (SSH config
object)
The full LocalSessions method list (117 methods) is in the registry
probe dump — if a future session needs to identify a specific
read-side, dump and grep there rather than re-enumerating.
6. **Filename convention for first-runtime-probe siblings.** When a
case-doc test has no Tier 1 fingerprint sibling and the Tier 2
reframe is the FIRST runner against that case-doc, name it
`T<NN>_runtime.spec.ts` (no `b` / `c` letter suffix). T19 / T20
followed this — same as T26 / T27 from earlier sessions. Use `b` /
`c` only when there's an earlier sibling to disambiguate against
(T22b after T22; T33c after T33b).
### Authoritative reference
Read these in order before fanning out:
- [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
— tier classification + status section. Read **session 9**,
**session 8**, **session 7**, **session 6**, **session 5**,
**session 4**, **session 3**, **session 2**, then **session 1**
— tier classification + status section. Read **session 10**, then
**session 9**, **session 8**, **session 7**, **session 6**, **session
5**, **session 4**, **session 3**, **session 2**, then **session 1**
"Status (post-execution)" sub-sections. The Tier-3 list (search for
"## Tier 3") is the candidate pool for further reframes.
- [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
— runner conventions, the now-70-spec inventory, primitives in
— runner conventions, the now-72-spec inventory, primitives in
`lib/`, isolation defaults, the CDP-gate workaround, the eipc
note (now updated to cover registry walk, renderer-wrapper
invocation, AND the schema-rev pattern from session 9).
note (covers registry walk, renderer-wrapper invocation, the
schema-rev pattern from session 9, and the foundational-getAll
pattern from session 10).
- [`docs/testing/cases/README.md`](cases/README.md) — case-doc
structure and the four anchor scopes.
- [`tools/test-harness/src/lib/`](../../tools/test-harness/src/lib/)
— the existing primitives. No session 9 additions; surface remains
— the existing primitives. No session 10 additions; surface remains
the session 8 shape (`getEipcChannels` / `findEipcChannel` /
`findEipcChannels` / `waitForEipcChannel` / `waitForEipcChannels` /
`invokeEipcChannel` on `lib/eipc.ts`).
@@ -106,166 +118,136 @@ Read these in order before fanning out:
— the session 7 read-only registry probe. Re-run against a
debugger-attached Claude (`Developer → Enable Main Process
Debugger` from the menu) to capture the current registry shape.
Session 10 used the existing probe verbatim plus a small
per-interface method-list dump (deleted after; lives in /tmp at
capture time).
- [`tools/test-harness/src/runners/`](../../tools/test-harness/src/runners/)
— every existing spec is a template. Notable session 9 template:
- `T33c_plugin_browser_invocation.spec.ts` — multi-suffix
`waitForEipcChannels` + per-suffix `invokeEipcChannel` loop with
`args = [[]]` for both methods, `Array.isArray` shape assertion,
180s budget for CLI-spawn worst case. Pattern for any Tier 2
invocation upgrade where the validator requires a positional arg
AND the impl may shell out to a subprocess.
— every existing spec is a template. Notable session 10 templates:
- `T19_runtime.spec.ts` / `T20_runtime.spec.ts` — multi-suffix
`waitForEipcChannels` over case-doc-anchored write-side suffixes
+ single `invokeEipcChannel('LocalSessions_$_getAll', [])` for
foundational read-side reachability. Pattern for any case-doc
test whose anchors are write-side and no read-side equivalent
invokes cleanly with `args = []`.
- [`docs/testing/cases/*.md`](cases/) — the spec each runner
asserts. The **Code anchors:** field tells you exactly where
upstream implements the feature.
### Tests in scope this session
**Realistic ceiling: ~2 new specs OR one investigation + one new
spec landing.** Session 9 was at the lower end (1 spec + 0 primitives)
because the schema-rev was the work; now that the validator pattern
is documented, follow-on Tier 2 invocation upgrades that need similar
schema work are cheaper. Session 8's upper end (3 specs + 1 primitive
extension) was a near-identical-shape batch; session 10's main bet
should aim for the lower-middle (2 specs OR 1 investigation + 1 spec).
**Realistic ceiling: ~1-2 new specs OR one investigation + one new
spec landing.** Session 10 landed 2 specs without primitive change;
Session 9 landed 1 spec. Session 11's main bet should aim for 1-2.
**Category B remains the natural next step** but its case-doc anchors
point at write-side handlers — needs read-side reframes before
shipping. Category C (operon / Launch exposure-vs-registration) is
still on the table from session 8 and is the smallest-scope option.
**Category A (T21 dev server preview) is now the natural next step.**
Launch IS registered (session 10 finding); only the cwd-arg schema-
rev separates it from invocation. Category B (T11 plugin install
runtime upgrade) is a parallel option using the same pattern.
Category C (operon-mode navigation probe) is investigation-shaped.
Three categories — pick ONE as the main bet, treat the others as
fallback if the main bet hits an early blocker:
| # | Tests | Source | Notes |
|---|---|---|---|
| **A** T19/T20 read-side reframes | T19, T20 (read-side) | T33c template + `lib/eipc.ts` invokeEipcChannel | Case-doc anchors are write-side (`startShellPty`, `writeSessionFile`). Investigate read-side equivalents in the registry first — `LocalSessions_$_listSessions`, `LocalSessions_$_getSessionInfo`, `LocalSessions_$_readSessionFile` are candidates per session 7's per-interface map (117 LocalSessions handlers total). Schema-rev each before invocation — the validator-rejection-grep pattern from session 9 applies. Risk: read-side handlers may not have one-to-one case-doc anchor mapping; the spec body has to motivate why the read-side reframe asserts the same surface as the write-side case-doc claim. |
| **B** Launch scope + T21 | T21 (dev server preview) | exposure-vs-registration probe + new spec | Confirm `claude.web/Launch/*` handlers register on the claude.ai webContents at all (session 7 mapped 53 distinct interfaces; Launch wasn't in the surfaced list — could be lazy-register on `.claude/launch.json` presence, or exposed-but-not-registered). If registered, ship a Tier 2 reframe similar to T33c. If wrapper-only, document and pivot. |
| **C** operon scope exposure-vs-registration probe | n/a (investigation) | new probe, possibly small Tier 1 reframe | Confirm whether operon handlers register on claude.ai webContents at any point, or only on operon-mode entry. Outputs: either a Tier 2 reframe of an operon case-doc test, OR a deferral note explaining why operon scope can't be reached without an operon-mode session. Smaller scope than A or B. |
| **A** T21 dev server preview | T21 | T19/T20 template + `lib/eipc.ts` invokeEipcChannel + bundle grep for `cwd` validator | `claude.web/Launch` registers 25 handlers (session 10 finding). T21's case-doc claim is "dev server preview pane"; the read-side reframe targets `Launch/getAutoVerify` or `Launch/getConfiguredServices` — both reject with `Argument "cwd" at position 0` on `args = []`. Schema-rev cwd via rejection-grep (session 9 pattern); cwd is likely just a string filesystem path. Then ship a Tier 2 invocation runner asserting the array-or-object shape. Risk: cwd validation may be more elaborate than a string (might need an existing-directory check); have a fallback path that uses the harness's isolation tmpdir as cwd. Smaller than A from session 10 — single suffix invocation, no multi-suffix registration probe needed unless you want to belt-and-suspender it. |
| **B** T11 plugin install runtime upgrade | T11 | T19/T20 template + read-side `LocalPlugins` enumeration | Session 7's registry probe surfaced 15 `LocalPlugins_*` handlers. T11 currently is a Tier 1 fingerprint only. Read-side candidate: `LocalPlugins/getPlugins` (likely returns array of installed plugins; needs schema-rev or smoke-test first). Same pattern as T19/T20 — registration probe + foundational read-side invocation. Risk: getPlugins may need a cwd or plugin-context arg; smoke-test first. |
| **C** operon-mode navigation probe | n/a (investigation) + maybe small Tier 2 reframe | new probe + bundle grep for operon URL routes | Session 10 confirmed `OperonBootstrap.ensure` registers eagerly but the other 21 operon interfaces remain registry-unconfirmed. Outputs: either an operon-mode URL form recovered from the bundle (search for `operon`-keyed routes in `claude.ai/...` paths) plus a registry re-probe after navigation, OR a deferral note explaining why operon scope can't be reached without an operon-mode entry. Smaller scope than A or B. |
#### Category A — T19/T20 read-side reframes
#### Category A — T21 dev server preview
The plan: pick read-side `LocalSessions_$_*` getters that map to the
write-side case-doc claims for T19 (integrated terminal) and T20
(file pane), then ship Tier 2 invocation runners against each.
The plan: schema-rev the `cwd` validator on `Launch/getAutoVerify`
or `Launch/getConfiguredServices`, then ship a Tier 2 invocation
runner.
**Investigation phase first** — case-doc anchors are write-side,
need read-side equivalents:
**Investigation phase first** — cwd format isn't yet known:
1. **Re-run `eipc-registry-probe.ts`** against a debugger-attached
Claude. Filter for `LocalSessions_$_*` and look for `list*` /
`get*` / `read*` patterns. Session 7 catalogued 117 LocalSessions
handlers but only listed sample method names (4 per interface).
The full list lives in `/tmp/eipc-registry-probe.json` from a
re-run. Candidate read-sides:
- For T19 (terminal): `LocalSessions_$_listSessions`,
`LocalSessions_$_getSessionInfo`, `LocalSessions_$_readPty`?
- For T20 (file pane): `LocalSessions_$_readSessionFile`,
`LocalSessions_$_listSessionFiles`?
2. **Schema-rev each candidate** using the session 9 pattern:
- First call: smoke test against the user's debugger-attached
Claude with `args = []`. Capture the rejection error.
- If rejection error includes `Argument "<name>" at position N
... failed to pass validation`, grep the bundle for the literal
rejection string to find the validator block.
- If the call succeeds with `[]`, you don't need schema-rev — go
straight to runner shape.
- If the call succeeds but returns a non-array shape, decide
whether the assertion shape is `Array.isArray` (T33c, T27) or
`non-array object` (T35b) or `string | null` (T37b) based on
what the read-side returns.
3. **Motivate the reframe** in the leading comment. T19's case-doc
claim is "integrated terminal opens"; the read-side reframe is
e.g. "the per-session listing handler is wired and returns an
array — the terminal-spawn path consumes this list to attach to
an existing PTY". The connection to the case-doc surface needs
to be plausible, not just "this handler returns an array".
1. **Re-run smoke-test** against the user's debugger-attached Claude
with various cwd shapes (mirror session 10's `/tmp/eipc-smoke-
test.ts`):
- `args = ['']` (empty string)
- `args = ['/tmp']` (existing directory)
- `args = ['/nonexistent']` (non-existent path — does the validator
gate on existence?)
- `args = ['/home/$USER']` (home dir)
- `args = ['.']` (relative path)
- `args = [process.cwd()]` (test CWD itself)
- `args = [{ path: '/tmp' }]` (object form — some validators wrap)
2. **Capture rejection messages**. If `Argument "cwd" ... must be a
string` → it's a flat string. If `must be an absolute path` → it
needs absolute. If a successful invocation returns an array/object,
you have the shape.
3. **Schema-rev the validator** via bundle grep on the rejection
message literal (session 9 finding). The validator block sits
~50-200 chars before the throw site.
4. **Motivate the reframe** in the leading comment. T21's case-doc
claim is "dev server preview pane starts on Preview → Start"; the
read-side reframe is e.g. "the configured-services / auto-verify
getters are wired and return their documented shape — the Preview
dropdown populates from this surface". The connection to the case-
doc surface needs to be plausible, not just "this handler returns
an array".
**Approaches to investigate (in order):**
1. **Re-run the registry probe against the user's running
debugger-attached Claude.** Cheapest signal — captures the full
`LocalSessions_$_*` method list as seen from main. Mirror the
existing probe; don't rewrite.
2. **Smoke-test candidate read-side suffixes** with `args = []`.
Capture rejections. The validator-rejection grep pattern from
session 9 resolves the schema cheaply.
3. **If a candidate's invocation succeeds**, draft a Tier 2 spec
using T33c's shape (multi-suffix if both T19 and T20 read-sides
land, single-suffix otherwise).
1. **Smoke-test cwd shapes** against the user's debugger-attached
Claude. Cheapest signal — directly probes what the validator
accepts.
2. **Bundle grep on the rejection message literal** for any rejection
not resolved by smoke-test alone. The validator block is byte-
adjacent to the throw site.
3. **Draft Tier 2 spec** using T19_runtime / T20_runtime shape (multi-
suffix `waitForEipcChannels` over the read-side getters, plus
`invokeEipcChannel` on the resolved cwd shape).
If Category A turns up empty after 2-3 distinct read-side candidates
(none invoke cleanly with `args = []`, all require schema-rev that
exceeds the session budget, OR the registry walk doesn't surface a
plausible read-side equivalent), STOP AND REPORT. Don't keep
digging — pivot to Category B or C.
If Category A's cwd schema doesn't resolve cleanly after 2-3 attempts
(rejections include shape constraints not derivable from the bundle,
all attempts fail validation, the validator demands an existing
directory and the test isolation tmpdir doesn't qualify), STOP AND
REPORT. Pivot to Category B or C.
#### Category B — Launch scope + T21
#### Category B — T11 plugin install runtime upgrade
The plan: confirm `claude.web/Launch/*` handlers register on the
claude.ai webContents (session 7's per-interface map didn't list
them; either lazy-register on `.claude/launch.json` presence, or
exposed-but-not-registered).
The plan: confirm `LocalPlugins/*` is a tractable invocation surface,
then ship a Tier 2 reframe.
1. **Re-run `eipc-registry-probe.ts`** filtering for
`claude.web_$_Launch_$_*`. If non-empty, treat similarly to
T33c: pick a read-side getter, schema-rev with the rejection-
grep pattern, ship a Tier 2 invocation runner.
2. **If empty**, navigate the running Claude to a project with
`.claude/launch.json` and re-run. If still empty, document
"Launch scope handlers register lazily on a path we can't
construct from the harness" and defer.
3. **If wrapper-exposed without registry-side handlers**,
document as a known limitation alongside the operon finding.
`LocalPlugins_$_*`. Session 7 surfaced 15 handlers but only listed
4 sample method names per interface. Dump the full method list.
2. **Smoke-test candidate read-sides** with `args = []`:
- `getPlugins`, `getDownloadedRemotePlugins`, `syncRemotePlugins`,
`listSkillFiles` (sample names from session 7)
- Capture rejections. Schema-rev via bundle grep if needed.
3. **Draft Tier 2 spec** as `T11_runtime.spec.ts` — registration
probe + foundational read-side invocation. The case-doc connection:
"T11 verifies the plugin install code path; the LocalPlugins
listing handler is wired and returns the documented array shape".
T21's case-doc claim is "dev server preview pane" — needs
`.claude/launch.json` AND a real project to fully exercise. The
Tier 2 reframe is "the Launch dispatch handler is registered AND
returns the documented shape on a known fixture path"; needs more
investigation than T19/T20.
Skip this category unless Category A is blocked AND Category C is
unappealing.
Skip this category unless Category A's read-side candidates turn
up empty AND Category C is also unappealing.
#### Category C — operon-mode navigation probe
#### Category C — operon / Launch exposure-vs-registration probe
The plan: find an operon-mode URL form and verify whether the other
21 operon interfaces register lazily.
The plan: write a small read-only probe (mirror
`eipc-registry-probe.ts`'s shape) that asks two questions for each
of operon and Launch:
1. **Bundle grep for operon URL routes.** Search the bundled
`index.js` and `mainView.js` for `operon`-keyed paths (e.g.
`/operon/...`, `claude.ai/operon`, etc.). Compile a candidate URL
list.
2. **Navigate the user's debugger-attached running Claude** to each
candidate URL via `inspector.evalInRenderer('claude.ai',
"window.location.href = '<URL>'")`. After each navigation, re-run
the registry probe and check the operon scope's interface count.
3. **If any URL surfaces additional operon handlers**, ship a small
Tier 2 reframe spec (e.g. probe `OperonBootstrap.ensure` invocation
shape, or assert the lazy-registration count).
4. **If none of the candidate URLs surface additional handlers**,
document as "operon scope handlers register lazily on a navigation
we can't easily construct from the harness" and defer.
1. **At fresh launch + post-login**, are any operon / Launch handlers
registered on the claude.ai webContents? Session 7's registry walk
on `/epitaxy` and `/new` didn't surface them in the per-interface
summary (which lists every `(scope, iface)` pair). Confirm — re-
run the registry walker, filter by `scope === 'claude.operon'` and
`scope === 'claude.web' && iface.startsWith('Launch')`, capture
the count.
2. **After navigating to operon-mode URL or a launch-config'd
project**, do the missing handlers appear? Operon-mode URLs are
TBD — search `claude.ai/...` paths in the bundle for `operon`-
keyed routes. Launch-config navigation needs `.claude/launch.json`
in the working folder.
3. **Independently**, does the renderer-side wrapper expose
`window['claude.operon']` / `window['claude.web'].Launch`
regardless of registration status? (Yes per session 8 for
operon; un-confirmed for Launch.)
Outputs:
- If operon / Launch handlers register on claude.ai eagerly: write a
one-liner Tier 2 reframe spec for the highest-priority case-doc
target.
- If they register lazily on a navigation we can't easily construct:
document the prerequisite in plan-doc Status section as a Tier 3
item ("requires operon-mode navigation primitive" /
"requires .claude/launch.json fixture"), and don't ship a probe.
- If the wrapper is exposed without registered handlers: document as
a known limitation of `invokeEipcChannel` (will fail with "no
handler registered" even though `window['claude.<scope>']` is
present).
This is a smaller-scope category — investigation + maybe one spec
landing. Best fallback if Category A's read-side candidates turn up
empty.
This is the smallest-scope category — investigation + maybe one
spec landing. Best fallback if Category A is blocked.
#### Cross-compositor focus-shifter expansion (NOT recommended this session)
@@ -284,13 +266,13 @@ section. Don't add it speculatively — wait for a real consumer.
### Constraints to respect (don't violate)
These are unchanged from sessions 1-9 and still load-bearing:
These are unchanged from sessions 1-10 and still load-bearing:
- **Default isolation** unless the spec needs otherwise. Use
`seedFromHost: true` for any test that depends on authenticated
renderer state — never assume default isolation gets past
`/login`. T16/T26/T22b/T27/T31b/T33b/T33c/T35b/T37b/T38b are the
templates.
`/login`. T16/T19/T20/T26/T22b/T27/T31b/T33b/T33c/T35b/T37b/T38b
are the templates.
- **eipc handlers register on `webContents.ipc._invokeHandlers`,
NOT global `ipcMain._invokeHandlers`.** Session 7 finding. Use
`lib/eipc.ts` rather than rolling a new walker. The framing
@@ -312,6 +294,15 @@ These are unchanged from sessions 1-9 and still load-bearing:
~2KB around it surfaces the full schema. Cheaper than runtime
closure inspection in most cases (closure inspection is a good
cross-check).
- **For session-scoped Tier 2 reframes: `LocalSessions/getAll` is
the foundational read-side surrogate.** Session 10 finding. When
a case-doc test's anchors are write-side LocalSessions handlers,
ship a registration probe over the case-doc-anchored suffixes
PLUS a single `invokeEipcChannel('LocalSessions_$_getAll', [])`
array-shape assertion as the read-side surrogate. The case-doc
connection: "this surface binds to a LocalSession; getAll proves
the LocalSessions impl object is reachable through the renderer
wrapper".
- **`lib/input.ts` is X11-only.** Strict `XDG_SESSION_TYPE ===
'x11'` gate. Wayland consumers must skip — don't try to bolt
Wayland into the file.
@@ -325,10 +316,7 @@ These are unchanged from sessions 1-9 and still load-bearing:
- **Code-tab AX anchors stay in plan-doc until a consumer needs
them.** Don't preemptively add `CodeTab.activateTopTab()` to
`claudeai.ts` — session 5's anchors block out the work for
whenever a future consumer surfaces. T19/T20 read-side reframes
may need them; pre-flight check before adding to `claudeai.ts`
whether the read-side path actually exercises the AX surface or
only the IPC.
whenever a future consumer surfaces.
- **CDP auth gate is alive** — runtime SIGUSR1 attach via
`app.attachInspector()`, never Playwright's `_electron.launch()`
or `chromium.connectOverCDP()`.
@@ -344,8 +332,8 @@ These are unchanged from sessions 1-9 and still load-bearing:
errors and short-circuit; see S11 / S14 for the pattern.)
- **Diagnostics on every run.** `testInfo.attach()` the artefacts.
Single-shot JSON dumps for multi-state tests (S11, S14, S31,
T22b, T27, T31b, T33b, T33c, T35b, T37b, T38b pattern) are
cleaner than 5+ separate attachments.
T19, T20, T22b, T27, T31b, T33b, T33c, T35b, T37b, T38b pattern)
are cleaner than 5+ separate attachments.
- **Tag with annotations.** `severity:` and `surface:` on every
test so JUnit carries them through to matrix-regen.
- **Tabs in TS, ~80-char wrap as the existing files do.** Match
@@ -369,25 +357,26 @@ These are unchanged from sessions 1-9 and still load-bearing:
contain credentials; scheduled-task instructions may reference
internal projects; marketplace `pluginContext`-filtered listings
may surface internal-org marketplace pointers (T33c's defensive
default).
default). T19/T20's `getAll` defensive default extends the
pattern: session metadata may include user-account-scoped paths
and titles.
### Phases
#### Phase 0 — calibration
1. `cd tools/test-harness && npm run typecheck` — should pass.
2. Read the plan doc's "Status (post-execution)" session 9 section,
2. Read the plan doc's "Status (post-execution)" session 10 section,
then read `lib/eipc.ts`'s `invokeEipcChannel` API +
`T33c_plugin_browser_invocation.spec.ts` leading comments.
Confirm you understand the multi-suffix invocation pattern, the
schema-rev approach (rejection-message grep), and the 180s
timeout budget.
`T19_runtime.spec.ts` / `T20_runtime.spec.ts` leading comments.
Confirm you understand the multi-suffix registration + foundational
read-side invocation pattern.
3. Pick ONE Category as the main bet. For Category A, plan the
approach: (a) re-run the registry probe to enumerate
LocalSessions read-sides, (b) smoke-test candidates with `args =
[]`, (c) schema-rev any rejections via bundle grep. List which
approaches you'll try in what order, with the cap at 2-3
distinct approaches before STOP AND REPORT.
approach: (a) smoke-test cwd shapes against `Launch/getAutoVerify`,
(b) bundle-grep any rejection literal for shape constraints, (c)
draft the Tier 2 invocation spec. List which approaches you'll try
in what order, with the cap at 2-3 distinct approaches before STOP
AND REPORT.
If Phase 0 surfaces a problem (typecheck failing, primitives unclear,
the chosen Category's prerequisites don't hold), stop and report.
@@ -395,32 +384,26 @@ Don't fan out.
#### Phase 1 — fan-out batch
For Category A (T19/T20 read-side reframes):
- Spawn ONE subagent per read-side candidate (or one per
investigation approach if candidates aren't yet identified):
registry-probe re-run, smoke-test of candidate suffixes, schema-
rev of any rejections. Treat as exploratory; report findings
For Category A (T21 dev server preview):
- Spawn ONE subagent for the cwd schema-rev investigation
(smoke-test + bundle-grep). Treat as exploratory; report findings
before committing to a spec shape. The user's debugger-attached
running Claude is a great target for verification probes (mirror
session 7's `eipc-registry-probe.ts` shape and session 9's
bundle-grep pattern).
- Cap re-spawns at 2-3 distinct approaches; if all empty, STOP AND
REPORT. Pivot to Category B or C if budget remains.
- If a candidate's schema is recoverable AND invocation lands
cleanly with valid args, second batch: ship `T19c` /
`T20c` (or whatever the file-naming convention dictates given
the existing T19/T20 files don't exist yet — use `_runtime`
suffix as session 7 did for T22b / T31b / T33b / T38b, OR
`_invocation` suffix as session 9 did for T33c, depending on
whether registration siblings T19b / T20b are also being
shipped).
- Cap at ~2 specs total — don't try to land both if the first one
surfaces an unexpected issue.
running Claude is a great target for verification probes.
- Cap re-spawns at 2-3 distinct approaches; if cwd schema doesn't
resolve, STOP AND REPORT. Pivot to Category B or C if budget
remains.
- If schema is recoverable AND invocation lands cleanly with valid
args, second batch: ship `T21_runtime.spec.ts`.
- Cap at ~1 spec total — T21 is single-suffix invocation, smaller
scope than T19/T20.
For Category C (operon / Launch scope probe):
- Single subagent writes the registry-walk probe modeled on
`eipc-registry-probe.ts`. User runs it (or you run via the
debugger if attached). Report findings; if a Tier 2 reframe is
For Category B (T11 plugin install runtime upgrade):
- Same shape as Category A — investigate read-side `LocalPlugins`
candidates, smoke-test, schema-rev, ship `T11_runtime.spec.ts`.
For Category C (operon-mode navigation probe):
- Single subagent does bundle-grep for operon URL routes + per-URL
registry re-probe. Report findings; if a Tier 2 reframe is
tractable, ship one spec.
#### Per-subagent prompt shape
@@ -459,16 +442,8 @@ If the target isn't reasonable to implement (anchors don't resolve
to anything assertable, the test depends on state you can't
construct, the existing primitives don't cover the surface), DO
NOT write a stub. Report under Open questions and stop. Sessions
1-9 had cumulative ~16 "stop and report" outcomes that were the
right call (S20 deferral, T05 reshape, T07 needs seedFromHost,
T08 needs setState('close'), S28 reclassification, T38 framing,
session-3 eipc-registry finding, T37 fixture-readback deferral,
S14 primitive-gap then primitive-build, T35/T36 Phase 2 deferrals,
T18 Tier 1 reframe, T36 Phase 2 reclassification to Tier 3/4,
session-6 lib/input-niri.ts shipped untested-on-niri, session-7
per-wc IPC scope finding overturning the session-3 closure-local
conclusion, session-8 renderer-wrapper-vs-main-side decision,
session-9 schema-rev cross-check via dual investigation).
1-10 had cumulative ~17 "stop and report" outcomes that were the
right call.
Report shape (~150 words):
## <TARGET> [runner | primitive | investigation]
@@ -501,7 +476,7 @@ After fan-out returns:
- Primitives landed (with API shape)
- Specs deferred (with the per-test rationale)
- Specs reclassified (Tier 3 → Tier 2, Tier 2 → Tier 1, etc.)
- Updated coverage stat (was 70/76 = 92%, now N/76 = M%)
- Updated coverage stat (was 72/76 = 95%, now N/76 = M%)
6. Don't commit. The user reviews and commits.
7. Rotate this prompt: rewrite
`docs/testing/runner-implementation-followup-prompt.md` for
@@ -509,7 +484,7 @@ After fan-out returns:
### Self-correction loop
Same as sessions 1-9:
Same as sessions 1-10:
1. Subagent typecheck failure → re-spawn with explicit fix
instruction.
@@ -521,19 +496,16 @@ Same as sessions 1-9:
4. Spec passes locally but the assertion is actually trivial (e.g.
an unauthenticated launch where the handler check vacuously
passes because no handlers are registered) → re-examine the
assertion shape. The lesson from sessions 3 and 7: verify the
assertion is meaningful, not just that it passes.
5. **Carry-over from session 5/6/7/8/9:** If pursuing Category A
and the read-side candidates turn up empty / require schema-rev
that exceeds budget after 2-3 approaches, STOP. Don't keep
digging — pivot to Category B or C. Document what was tried.
6. **NEW for session 10:** If a Category A read-side reframe
surfaces a "registered but uninvocable" pattern (handler is on
the registry but the renderer-side wrapper isn't exposed for
the relevant scope), that's the same shape as session 8's
find_in_page / main_window observation — document it and
defer rather than building the main-side fallback
speculatively.
assertion shape.
5. **Carry-over from session 5/6/7/8/9/10:** If pursuing Category A
and the cwd schema doesn't resolve / requires schema-rev that
exceeds budget after 2-3 approaches, STOP. Don't keep digging —
pivot to Category B or C. Document what was tried.
6. **Carry-over from session 10:** If a registration probe surfaces
"registered but uninvocable" (handler is on the registry but the
renderer-side wrapper isn't exposed for the relevant scope or the
validator rejects every smoke-test arg shape), document and
defer rather than building the main-side fallback speculatively.
Cap re-spawns at 2 per file. Past that, mark as needing human
review and move on.
@@ -549,12 +521,12 @@ Stop and write the final report when one of:
3. **Discovered a primitive gap that breaks 5+ Tier 2/Tier 3
tests.** Stop, propose where the new primitive should live in
`lib/`. Future session adds the primitive first, then resumes.
4. **Session budget hits ~2 new specs OR one new primitive
4. **Session budget hits ~1-2 new specs OR one new primitive
landing.** Stop, synthesize, leave the rest for the next
session.
5. **Category A read-side candidates all turn up empty after 2-3
distinct attempts.** Document the dead-end as a finding, pivot
to Category B or C if budget remains.
5. **Category A cwd schema doesn't resolve after 2-3 distinct
attempts.** Document the dead-end as a finding, pivot to
Category B or C if budget remains.
### What you should NOT do
@@ -563,8 +535,8 @@ Stop and write the final report when one of:
fallback.
- **Don't ship stubs.** If a runner can't actually assert what the
spec says, mark it as Tier 3 / blocked / primitive-gap and
don't write a placeholder. The cumulative sixteen "stop and
report" outcomes from sessions 1-9 were the right call — every
don't write a placeholder. The cumulative seventeen "stop and
report" outcomes from sessions 1-10 were the right call — every
one revealed a real constraint.
- **Don't break existing runners.** H01-H05 are the canaries.
- **Don't restructure `lib/`** beyond targeted additions.
@@ -585,11 +557,11 @@ Stop and write the final report when one of:
`start*`, `set*`, `write*`, `run*`, `openIn*`, `delete*`,
`cancel*`, `reset*`, `installPlugin`, `enablePlugin`. The
primitive doesn't enforce a read-only allowlist; the safety
property is that case-doc-anchored suffixes are read-side.
Session 9 reframed T33's invocation through the read-side
`list*` methods specifically because of this. T19/T20 (Category
A) need the same treatment — case-doc anchors at write-side
handlers must be reframed through read-side equivalents.
property is that case-doc-anchored suffixes are read-side OR
case-doc-anchored write-side suffixes are tested via REGISTRATION
ONLY (`waitForEipcChannels`), never invoked. T19/T20 ship
registration probes over write-side suffixes — that's the safe
pattern.
- **Don't bolt other compositors into `lib/input-niri.ts`.**
Sway / Hyprland / River each get their own per-compositor file
if a consumer surfaces.
@@ -600,8 +572,6 @@ Stop and write the final report when one of:
- **Don't preemptively build `CodeTab.activateTopTab()` /
`startNewSession()`.** Session 5 captured the AX anchors but
T36 Phase 2 (the only known consumer) was reclassified out.
T19/T20 read-side reframes may not even need them if the
read-side path is purely IPC-driven.
- **Don't add a main-side `invokeEipcChannel` fallback
speculatively.** Build it only if a concrete consumer needs to
invoke through a non-claude.ai webContents. Premature primitives
@@ -613,13 +583,13 @@ Stop and write the final report when one of:
### Final report format
```markdown
## Runner implementation summary (session 10)
## Runner implementation summary (session 11)
- Main-bet category: A | B | C
- Specs landed: N
- Primitives landed: N
- Reclassified mid-flight: N (with reasons)
- Coverage: was 70/76 (92%), now <NEW>/76 (<PCT>%)
- Coverage: was 72/76 (95%), now <NEW>/76 (<PCT>%)
- Typecheck: clean | <errors>
- KDE-W test run: <pass/skip/fail counts>
@@ -627,7 +597,7 @@ Stop and write the final report when one of:
| Cat | Test ID | File | Assertion shape | Status |
|---|---|---|---|---|
| A | T19c | T19c_*.spec.ts | … | ✓ pass / skip / fail |
| A | T21_runtime | T21_runtime.spec.ts | … | ✓ pass / skip / fail |
| ... |
## Notable findings
@@ -682,14 +652,14 @@ git diff --stat
- For eipc registry walking: `lib/eipc.ts` exports
`getEipcChannels` / `findEipcChannel` / `findEipcChannels` /
`waitForEipcChannel` / `waitForEipcChannels` against
`webContents.ipc._invokeHandlers`. See T22b / T31b / T33b /
T38b for end-to-end consumer patterns.
`webContents.ipc._invokeHandlers`. See T19 / T20 / T22b / T31b /
T33b / T38b for end-to-end consumer patterns.
- For eipc invocation: `lib/eipc.ts` exports `invokeEipcChannel`
(renderer-side wrapper at
`window['claude.<scope>'].<Iface>.<method>`). See T27 / T33c /
T35b / T37b for end-to-end consumer patterns. Only call read-
side suffixes; the primitive doesn't enforce a read-only
allowlist.
`window['claude.<scope>'].<Iface>.<method>`). See T19 / T20 /
T27 / T33c / T35b / T37b for end-to-end consumer patterns. Only
call read-side suffixes; the primitive doesn't enforce a read-
only allowlist.
- **For arg validator schema-rev (session 9 finding):** when
invocation rejects with `Argument "<name>" at position N ...
failed to pass validation`, grep the bundled `index.js` for the
@@ -698,6 +668,12 @@ git diff --stat
schema. See plan-doc session 9 status section for the byte
offsets of the two CustomPlugins validators (5013601 / 5018821)
as worked examples.
- **For session-scoped Tier 2 reframes (session 10 finding):**
`LocalSessions/getAll` is the foundational read-side surrogate.
Pattern: `args = []`, returns `Array<Session>`, the case-doc
connection is "this surface binds to a LocalSession; getAll
proves the LocalSessions impl object is reachable through the
renderer wrapper". T19 and T20 are the templates.
- **For asar fingerprints: ALWAYS grep the installed asar
first.** Build-reference is beautified; the bundle is
minified. Case-doc text may be the user-facing form, not the

View File

@@ -18,6 +18,114 @@ work begins.
## Status (post-execution)
**Shipped session 10 (2 new specs, no primitive change):** T19 + T20
(Tier 2 reframes — `seedFromHost` + multi-suffix registration probe
over the case-doc-anchored write-side handlers + invocation of the
foundational read-side `LocalSessions/getAll` as the surrogate).
First runtime probes for both T19 (integrated terminal) and T20
(file pane) — neither had a Tier 1 fingerprint sibling because the
case-doc anchors are channel names + impl line numbers, not user-
facing literals. Coverage moved from 70/76 (92%) to 72/76 (95%).
Session 10 findings + reclassifications:
- **`claude.web/Launch` IS registered on the claude.ai webContents
with 25 handlers** — overturns session 7's per-interface map which
did not list Launch (it captured /epitaxy with cowork loaded; the
Launch interface was either lazy-registered after a navigation
not yet performed or the per-interface enumeration missed it). The
session 10 registry probe re-run on /epitaxy with an active session
saw all 25: `getLogs`, `stopServer`, `showPreview`, `hidePreview`,
`startFromConfig`, `getConfiguredServices`, `getAutoVerify`,
`setAutoVerify`, `deployPreview`, `destroyPreview`, `pickHtmlFile`,
`loadHtmlPreview`, `goBack`, `goForward`, `refreshPreview`,
`navigatePreview`, `getPreviewUrl`, `setPreviewColorScheme`,
`setPreviewViewport`, `clearPreviewViewport`, `capturePreviewScreenshot`,
`suggestDeployName`, `unpublishDeploy`, `toggleSelectionMode`,
`activeServers_$store$_getState`. T21's case-doc claim (dev server
preview pane) is now reachable as a Tier 2 reframe — not shipped this
session, deferred to next.
- **`claude.web/Launch` invocation is gated on a `cwd` argument.**
Smoke-test of `Launch/getConfiguredServices` and `Launch/getAutoVerify`
against the user's debugger-attached running Claude rejected with
`Argument "cwd" at position 0 to method "<name>" in interface
"Launch" failed to pass validation`. Schema-rev next session via the
rejection-message grep pattern (session 9 finding) — the validator
block sits ~50-200 chars before the throw site in the bundled
`index.js`. T21 ships once the cwd format is recovered.
- **`claude.operon/OperonBootstrap.ensure` registers eagerly on
claude.ai** — partially answers session 8's open question. The
registry probe surfaced 1 operon handler (`OperonBootstrap.ensure`)
on /epitaxy with the active Code session. The other 21 wrapper-
exposed operon interfaces (per session 8's `mainView.js` namespace
count) remain registry-unconfirmed; either they lazy-register on
operon-mode entry, or the `claude.operon` wrapper is exposed
without registration as session 8 hypothesized. Worth a follow-up
navigation probe (operon-mode URL TBD — would need to grep
`claude.ai/...` paths in the bundle for `operon`-keyed routes), but
the current finding is enough to stop calling operon "registry-un-
confirmed" — at least one handler IS registered.
- **`LocalSessions` registers 117 methods** (full list dumped to
`/tmp/eipc-full-methods.json` during smoke-test). Read-side methods
invocable cleanly with `args = []` as confirmed by smoke-test on the
user's debugger-attached running Claude:
- `getAll``Array<Session>` (length 1 on dev box's active /epitaxy
session)
- `getInstalledEditors``Array<EditorConfig>` (length 4 on dev box)
- `getDetectedProjects``Array<Project>` (length 0 on dev box, no
detected projects in the harness's CWD)
- `isVSCodeInstalled` → boolean (false on dev box)
- `getSSHConfigs``Array<SSHConfig>` (length 0 on dev box)
- `getTrustedSSHHosts``Array<Host>` (length 0 on dev box)
- `getDefaultEffort` → object (returns null on dev box)
- `getSupportedCommands``Array<Command>` (length 25 on dev box)
Several other read-sides DO require args:
- `getDefaultPermissionMode` rejects with `Argument "cwd" at position
0 ... failed to pass validation` — needs cwd
- `getSSHSupportedCommands` rejects with `Argument "config" at position
0 ... failed to pass validation` — needs an SSH config object
T19 and T20 use `getAll` as the foundational read-side surrogate
because both surfaces (terminal + file pane) bind to LocalSessions;
the session enumeration handler is what proves the LocalSessions
impl object is reachable through the renderer wrapper.
- **T19/T20 case-doc anchors are write-side; reframe is registration
+ foundational read-side invocation.** T19's anchors are all
`LocalSessions_$_*ShellPty*` (start/write/stop/resize/getBuffer);
T20's are `readSessionFile` (read-side but needs sessionId+path
args not constructible from a fresh isolation) + `writeSessionFile`
(write-side, would mutate user content if invoked). The strongest
non-destructive Tier 2 layer for both is registration probe over
the case-doc-anchored suffixes plus a single invocation of `getAll`.
Different shape from T33c (which invokes each case-doc-anchored
suffix because both `listMarketplaces`/`listAvailablePlugins` are
read-side); T19/T20 mirror T22b/T31b/T33b/T38b's registration
shape but add the `getAll` invocation for the impl-object
reachability assertion.
- **No primitive change.** `lib/eipc.ts`'s `waitForEipcChannels` +
`invokeEipcChannel` cover both new specs. The existing primitive
surface remains broad enough that consumer-driven additions are
the right next move, not fresh primitive builds.
- **Filename convention: `_runtime` suffix, no `b`/`c` letter.** T19
and T20 had no prior runners — these are the first siblings, so
the naming follows T26/T27 (single `_runtime` Tier 2 reframe with
no fingerprint predecessor) rather than T33b/T33c (numbered after
earlier siblings). `T19_runtime.spec.ts` / `T20_runtime.spec.ts`.
Tier 2 → Tier 2 candidates remaining for next session: **T21 dev
server preview** (NOW tractable — `claude.web/Launch` registers 25
handlers including read-side getters; needs `cwd` arg schema-rev via
the rejection-message grep pattern). **T11 plugin install** (currently
just a fingerprint — could promote to invocation if a read-side
plugin-listing handler is identified; `LocalPlugins/getPlugins` is
listed in the registry probe with 15 LocalPlugins handlers). **operon
scope navigation probe** (still on the table — the partial answer is
that OperonBootstrap registers eagerly, but the other 21 interfaces
would need an operon-mode navigation; URL form TBD). The primitive
surface remains broad enough that consumer-driven additions are the
right next move.
---
**Shipped session 9 (1 new spec, no primitive change):** T33c (Tier 2
runtime invocation upgrade — `seedFromHost` + dual-handler invocation
of `claude.web/CustomPlugins/{listMarketplaces, listAvailablePlugins}`

View File

@@ -7,7 +7,7 @@ architecture, decisions, and rationale.
## Status
Seventy specs wired (32 cross-env T-tests, 33 env-specific S-tests,
Seventy-two specs wired (34 cross-env T-tests, 33 env-specific S-tests,
5 H-prefix harness self-tests). See
[`docs/testing/runner-implementation-plan.md`](../../docs/testing/runner-implementation-plan.md)
for the tiered triage of remaining tests and the per-spec rationale
@@ -33,6 +33,8 @@ behind tier classification.
| [T16](../../docs/testing/cases/code-tab-foundations.md#t16--code-tab-loads) | After `seedFromHost` + `userLoaded`, `CodeTab.activate()` resolves and ≥1 compact pill renders (env pill = Code-body mounted) | L1 + AX-tree |
| [T17](../../docs/testing/cases/code-tab-foundations.md#t17--folder-picker-opens) | Code df-pill → env pill → Local → Select folder → Open folder triggers `dialog.showOpenDialog` (requires `CLAUDE_TEST_USE_HOST_CONFIG=1`) | L1 |
| [T18](../../docs/testing/cases/code-tab-foundations.md#t18--drag-and-drop-files-into-prompt) | Bundled `mainView.js` preload contains the path-resolution bridge fingerprints: `getPathForFile` (2× — property key + the `webUtils.getPathForFile(` call, both at case-doc :9267), `webUtils`, `filePickers`, and the `claudeAppSettings` `contextBridge.exposeInMainWorld` namespace (case-doc :9552) — pins the load-bearing wiring without faking OS-level XDND drag (xdotool can't put file URIs on the X11 selection; Wayland needs per-compositor IPC + libei) | file probe |
| [T19](../../docs/testing/cases/code-tab-foundations.md#t19--integrated-terminal) | After `seedFromHost` + `userLoaded`, the integrated-terminal eipc surface (`startShellPty`, `writeShellPty`, `stopShellPty`, `resizeShellPty`, `getShellPtyBuffer` — five-suffix presence probe) is registered on the claude.ai webContents AND the foundational `LocalSessions/getAll` returns array shape (Tier 2 reframe of the case-doc T19 case; case-doc anchors are write-side `startShellPty` etc. so reframe asserts the FULL terminal IPC surface registers + a stateless read-side surrogate is invocable) | L1 (eipc registry + invoke) |
| [T20](../../docs/testing/cases/code-tab-foundations.md#t20--file-pane-opens-and-saves) | After `seedFromHost` + `userLoaded`, the file-pane eipc surface (`readSessionFile`, `writeSessionFile`, `pickSessionFile` — three-suffix presence probe) is registered on the claude.ai webContents AND the foundational `LocalSessions/getAll` returns array shape (Tier 2 reframe of the case-doc T20 case; the case-doc's `readSessionFile` anchor is read-side but needs (sessionId, path) args not constructible from a fresh isolation, so the registration probe + foundational `getAll` invocation is the strongest non-destructive Tier 2 layer) | L1 (eipc registry + invoke) |
| [T22](../../docs/testing/cases/code-tab-workflow.md#t22--pr-monitoring-via-gh) | Bundled `index.js` contains `LocalSessions_$_getPrChecks` eipc channel name *and* `gh CLI not found in PATH` Linux-fallthrough throw site (Tier 1 fingerprint) | file probe |
| [T22b](../../docs/testing/cases/code-tab-workflow.md#t22--pr-monitoring-via-gh) | After `seedFromHost` + `userLoaded`, the `LocalSessions_$_getPrChecks` eipc handler is registered on the claude.ai webContents (`webContents.ipc._invokeHandlers` — Tier 2 runtime probe sibling of T22, strictly stronger than the bundle-string fingerprint) | L1 (eipc registry) |
| [T23](../../docs/testing/cases/code-tab-handoff.md#t23--desktop-notifications-fire) | Firing `new Notification({title})` from main reaches the session bus's `org.freedesktop.Notifications.Notify` (observed via `dbus-monitor`) | L1 + DBus subprocess |
@@ -112,14 +114,14 @@ window; `NiriIpcUnavailable` thrown off-Niri; consumed by S14), the
`lib/eipc.ts` registry walker (`getEipcChannels` /
`waitForEipcChannel` / `waitForEipcChannels` against
`webContents.ipc._invokeHandlers`; opaque on the UUID, suffix-matched
against case-doc anchors; consumed by T22b / T31b / T33b / T38b)
plus its session 8 invoke surface (`invokeEipcChannel` — calls a
registered handler through the renderer-side wrapper at
`window['claude.<scope>'].<Iface>.<method>`; consumed by T27 / T33c /
T35b / T37b) — and the `createIsolation({ seedFromHost: true })`
primitive that lets login-required tests run hermetically against a
copy of the host's signed-in auth state (T07, T16, T22b, T26, T27,
T31b, T33b, T33c, T35b, T37b, T38b).
against case-doc anchors; consumed by T19 / T20 / T22b / T31b / T33b /
T38b) plus its session 8 invoke surface (`invokeEipcChannel` — calls
a registered handler through the renderer-side wrapper at
`window['claude.<scope>'].<Iface>.<method>`; consumed by T19 / T20 /
T27 / T33c / T35b / T37b) — and the `createIsolation({ seedFromHost:
true })` primitive that lets login-required tests run hermetically
against a copy of the host's signed-in auth state (T07, T16, T19,
T20, T22b, T26, T27, T31b, T33b, T33c, T35b, T37b, T38b).
Note on eipc channels: the `LocalSessions_$_*` and `CustomPlugins_$_*`
channel names referenced in the case-doc Code anchors don't register
@@ -136,12 +138,21 @@ against the bundled channel-name strings; T22b / T31b / T33b / T38b
are the runtime registry-presence siblings (strictly stronger,
require `seedFromHost`). T27 / T33c / T35b / T37b go one step
further — they invoke the resolved handlers through the renderer-
side wrapper at `window['claude.<scope>'].<Iface>.<method>`, which
`mainView.js` exposes via `contextBridge.exposeInMainWorld` after a
top-frame + origin gate (`Qc()`: claude.ai / claude.com / preview.*
/ localhost). Calling through the wrapper carries an honest
`senderFrame` for the inlined `le()` / `Vi()` per-handler origin
gate, so the test surface matches real attack surface. T33c also
side wrapper at `window['claude.<scope>'].<Iface>.<method>`. T19 /
T20 are first-runtime-probe siblings of case-doc tests whose anchors
are write-side handlers (`startShellPty` / `writeSessionFile`); they
ship a five-suffix / three-suffix registration probe over the
case-doc-anchored write-side surface plus a single foundational
read-side `LocalSessions/getAll` invocation as the read-side
surrogate (case-doc connection: integrated terminal and file pane
both bind to LocalSessions; `getAll` proves the LocalSessions impl
object is reachable through the renderer wrapper). All wrapper
invocations use the wrapper exposed by `mainView.js` via
`contextBridge.exposeInMainWorld` after a top-frame + origin gate
(`Qc()`: claude.ai / claude.com / preview.* / localhost). Calling
through the wrapper carries an honest `senderFrame` for the inlined
`le()` / `Vi()` per-handler origin gate, so the test surface matches
real attack surface. T33c also
demonstrates the schema-rev path: when invocation rejects with
`Argument "<name>" at position N ... failed to pass validation`,
the verbatim rejection string is the cheapest grep target back to
@@ -149,7 +160,7 @@ the inline hand-rolled validator block (bundle bytes 5013601 /
5018821 for the two CustomPlugins methods). See `lib/eipc.ts` for
both surfaces, and
[`runner-implementation-plan.md`](../../docs/testing/runner-implementation-plan.md)
session 7 / 8 / 9 status sections for the findings.
session 7 / 8 / 9 / 10 status sections for the findings.
Per-row pass/skip counts depend on which sweep runs against the row;
see `runner-implementation-plan.md` for tier classification and