mirror of
https://github.com/aaddrick/claude-desktop-debian.git
synced 2026-05-17 00:26:21 +03:00
docs(testing): session 9 plan/inventory + rotate session 10 prompt
Plan-doc Status section gains a session 9 block documenting the schema-rev finding (hand-rolled positional validators on the two CustomPlugins methods, byte offsets, minimal valid arg literal, two impl variants), the dual-investigation pattern (bundle grep + runtime closure inspection converged independently), and the rejection-message-grep schema-rev shortcut for future sessions. README inventory bumps to 70 specs, adds the T33c row, threads T33c through the eipc-invoke consumer list and the seedFromHost consumer list, and surfaces the validator-rejection-grep pattern in the eipc note. Followup-prompt rotated for session 10. Carries over the operon scope question from session 8 and adds the Launch scope question from session 9 (both "wrapper-exposed but registry-unconfirmed" shape — feeds Category C). Promotes T19/T20 read-side reframes to Category A (case-doc anchors at write-side handlers; read-side equivalents need to be enumerated from the registry walker first). Co-Authored-By: Claude <claude@anthropic.com>
This commit is contained in:
@@ -1,138 +1,119 @@
|
||||
# test-harness runner implementation — session 9 prompt
|
||||
# test-harness runner implementation — session 10 prompt
|
||||
|
||||
This file is meant to be **copied verbatim into a fresh Claude Code
|
||||
session** as the initial user message. Don't paraphrase it; the
|
||||
orchestration depends on the exact directives below.
|
||||
|
||||
You're picking up after a runner-implementation session that landed 3
|
||||
new specs (T35b, T37b, T27) and 1 primitive extension
|
||||
(`invokeEipcChannel` on `lib/eipc.ts`). Coverage 66/76 (87%) → 69/76
|
||||
(91%). One commit on `docs/compat-matrix`:
|
||||
You're picking up after a runner-implementation session that landed 1
|
||||
new spec (T33c) by way of reverse-engineering the
|
||||
`CustomPlugins/listMarketplaces` arg validator. No primitive change.
|
||||
Coverage 69/76 (91%) → 70/76 (92%). One commit on `docs/compat-matrix`
|
||||
expected (SHA inserted after the test-harness commit lands — the user
|
||||
reviews and commits at the end of every session):
|
||||
|
||||
- `7ffd73a` — `test(harness): session 8 runners + invokeEipcChannel
|
||||
primitive` (3 new Tier 2 invocation probes — T35b / T37b paired with
|
||||
the existing T35 / T37 Tier 1 fingerprints, plus T27 as the case-doc
|
||||
Tier 2 reframe; new `invokeEipcChannel` API on `lib/eipc.ts` calls
|
||||
through the renderer-side wrapper at
|
||||
`window['claude.<scope>'].<Iface>.<method>`, opaque on the framing
|
||||
UUID, suffix-matched against case-doc anchors).
|
||||
|
||||
(SHA inserted after the test-harness commit lands — the user reviews
|
||||
and commits at the end of every session.)
|
||||
- TBD — `test(harness): session 9 T33c plugin browser invocation`
|
||||
(Tier 2 invocation upgrade of T33b; schema-rev surfaced the
|
||||
byte-identical hand-rolled validator on both `listMarketplaces` and
|
||||
`listAvailablePlugins`; minimal valid arg is `[[]]` — empty
|
||||
egressAllowedDomains, omit pluginContext; passes on KDE-W in 39.2s
|
||||
with array shape on both invocations).
|
||||
|
||||
The plan doc at
|
||||
[`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
|
||||
captures the tier classification and execution-time reclassifications.
|
||||
Its "Status (post-execution)" section is the source of truth for
|
||||
what's done and what's deferred — read **session 8** first, then
|
||||
**session 7**, then **session 6**, then **session 5**, then **session
|
||||
4**, then **session 3**, then **session 2**, then **session 1** sub-
|
||||
sections.
|
||||
what's done and what's deferred — read **session 9** first, then
|
||||
**session 8**, then **session 7**, then **session 6**, then **session
|
||||
5**, then **session 4**, then **session 3**, then **session 2**, then
|
||||
**session 1** sub-sections.
|
||||
|
||||
This session is a continuation, not a restart. Start by reading the
|
||||
plan doc's status sections.
|
||||
|
||||
### Big new findings from session 8
|
||||
### Big new findings from session 9
|
||||
|
||||
1. **eipc invocation works through the renderer-side wrapper.** The
|
||||
per-handler origin gate (`le(e)` / `Vi(e)` / `mm(e)`) is a
|
||||
structural duck-type check on `event.senderFrame.url` and
|
||||
`event.senderFrame.parent === null`, not an `instanceof Frame`
|
||||
check. Two viable paths exist: (a) main-side direct call with a
|
||||
synthesized event whose `senderFrame.url = 'https://claude.ai/'`,
|
||||
and (b) renderer-side wrapper at
|
||||
`window['claude.<scope>'].<Iface>.<method>(...args)` exposed by
|
||||
`mainView.js` via `contextBridge.exposeInMainWorld` after the
|
||||
`Qc()` exposure gate. Session 8 chose (b) for the primitive — it
|
||||
honors the gate honestly (no senderFrame spoofing) and aligns test
|
||||
surface with real attack surface. Approach (a) stays available
|
||||
for future scopes whose renderer-side wrapper isn't exposed
|
||||
(e.g. find_in_page / main_window webContents host
|
||||
`claude.settings/*` handlers in their per-wc registry but their
|
||||
renderers are at `file://` so the wrapper isn't there); not
|
||||
implemented in this session — no current consumer.
|
||||
2. **`mainView.js` exposes 9 wrapper namespaces, more than the
|
||||
registry-side count.** Session 7 catalogued
|
||||
`claude.settings`/`claude.web` plus `claude.app_internal` (small
|
||||
surface) on the per-wc registries. Session 8's renderer-side probe
|
||||
surfaced `claude.operon`, `claude.skills`, `claude.simulator`,
|
||||
`claude.officeAddin`, `claude.hybrid`, `claude.buddy`, plus
|
||||
`claudeAppBindings` / `claudeAppSettings`, totalling 9 distinct
|
||||
`window['claude.*']` namespaces. The operon scope is suspicious:
|
||||
wrapper exposes 22 interfaces but session 7's registry walk on
|
||||
`/epitaxy` and `/new` saw zero operon handlers registered on
|
||||
claude.ai. Either operon handlers register lazily on entering an
|
||||
operon-mode session, or the wrapper is exposed even when the
|
||||
handler isn't yet registered (in which case `invokeEipcChannel`
|
||||
would fail with "no handler registered with suffix"). Worth a
|
||||
one-liner probe before any operon-scope spec lands.
|
||||
3. **`invokeEipcChannel(inspector, suffix, args?, opts?)` is the new
|
||||
surface.** Suffix is the same case-doc-anchored input that
|
||||
`findEipcChannel` accepts (e.g. `MCP_$_getMcpServersConfig` or
|
||||
the fully-qualified
|
||||
`claude.settings_$_MCP_$_getMcpServersConfig`). Internally
|
||||
resolves the full suffix through `findEipcChannel`, splits on
|
||||
`_$_` to recover `[scope, iface, method]`, then
|
||||
`evalInRenderer(urlFilter, "window[scope][iface][method](...args)")`.
|
||||
Default `urlFilter` is `'claude.ai'`. Args are JSON-marshaled in;
|
||||
return value is JSON-deserialized via `evalInRenderer`'s
|
||||
`executeJavaScript` path. Read-by-default but not allowlist-
|
||||
enforced — the safety property is that consumers pass case-doc-
|
||||
anchored suffixes verbatim.
|
||||
4. **Renderer-eval errors are stringified.** When the underlying
|
||||
handler rejects (origin gate, arg validator, result validator),
|
||||
the error surface is `Error: Error invoking remote method
|
||||
'<framed-channel>': <inner-message>`. The framed channel name in
|
||||
the message lets consumers triage per-handler. Native exceptions
|
||||
get JSON-stringified through the inspector eval boundary; per-
|
||||
handler triage is intact but stack traces are lost on the renderer
|
||||
side.
|
||||
5. **The session 8 prompt's `le(i)` reference at `:68820` was off.**
|
||||
Approach 3's investigator flagged that `le` is at `:5045138` in
|
||||
this build; offset 68820 hits OpenTelemetry SemRes constants.
|
||||
Doesn't change the outcome (the gate's behavior is the same
|
||||
regardless of offset) but worth noting if a future probe takes a
|
||||
followup-prompt offset literally — always confirm offsets against
|
||||
the current bundle before relying on them.
|
||||
1. **Hand-rolled positional arg validators.** Both
|
||||
`claude.web/CustomPlugins/listMarketplaces` and `listAvailablePlugins`
|
||||
use byte-identical inline `Array.isArray(...) && r.every(a => typeof
|
||||
a === "string")` checks for `egressAllowedDomains: string[]` (arg 0,
|
||||
required) plus an optional `pluginContext` checked by a closed-over
|
||||
`sc(...)` requiring `mode: string`. NOT Zod for args — the result
|
||||
validator IS Zod, runs after the impl returns. Validator blocks at
|
||||
bytes 5013601 / 5018821 in the bundled `index.js` (single-line
|
||||
minified bundle, ~15 MB, byte offsets not line numbers). Minimal
|
||||
valid arg: `args = [[]]`. The empty allow-list is the safety
|
||||
property — if the underlying impl is the CLI-shelling variant, it
|
||||
forwards as the spawned subprocess's permitted domains.
|
||||
2. **Two impl variants exist.** Both methods have a CLI-shelling impl
|
||||
(`runCommand(["plugin", ...], { timeout: 30s/60s, allowedDomains: A
|
||||
})`) AND a native impl (reads `knownMarketplacesFile` /
|
||||
`marketplacesDir` directly). Selection logic isn't called out in
|
||||
the registered handler's closure source; both variants return the
|
||||
same `Array<…>` shape on success. T33c's `Array.isArray(result) ===
|
||||
true` assertion holds regardless of which is active. Test budget
|
||||
bumped to 180s to accommodate worst-case sequential CLI timeouts.
|
||||
3. **Validator rejection messages are the cheapest grep target.** When
|
||||
`invokeEipcChannel` rejects with `Argument "<name>" at position N
|
||||
... failed to pass validation`, the verbatim rejection string in
|
||||
the inline validator block is the entry point — single grep on the
|
||||
literal error message resolves to the exact validator location in
|
||||
the bundle. Save this pattern for any future schema-rev session
|
||||
where invocation fails with a structured rejection.
|
||||
4. **Bundle grep + runtime closure inspection converged independently.**
|
||||
Two parallel investigations (subagents read the bundle vs. read
|
||||
`Function.prototype.toString` of the registered handler via the
|
||||
debugger-attached running Claude on :9229) produced byte-identical
|
||||
validator literals and the same minimal arg shape. High confidence
|
||||
on the schema. Worth using the dual-approach pattern again when a
|
||||
future schema-rev needs cross-checking — both paths are cheap and
|
||||
the false-positive rate goes to zero when they agree.
|
||||
5. **`mainView.js` exposes 9 wrapper namespaces but only 5 currently
|
||||
have registry-confirmed handlers** on the claude.ai webContents.
|
||||
Carryover from session 8: `claude.operon` exposes 22 interfaces in
|
||||
the renderer wrapper but session 7's registry walk on `/epitaxy`
|
||||
and `/new` saw zero operon handlers registered. Either operon
|
||||
handlers register lazily on operon-mode entry, or the wrapper is
|
||||
exposed even when the handler isn't yet registered (in which case
|
||||
`invokeEipcChannel` would fail with "no handler registered with
|
||||
suffix"). Same uncertainty applies to `claude.web/Launch/*`
|
||||
(relevant for T21 dev server preview): wrapper present, registry
|
||||
un-confirmed. Worth a one-liner probe before any operon-scope or
|
||||
Launch-scope spec lands.
|
||||
|
||||
### Authoritative reference
|
||||
|
||||
Read these in order before fanning out:
|
||||
|
||||
- [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
|
||||
— tier classification + status section. Read **session 8**,
|
||||
**session 7**, **session 6**, **session 5**, **session 4**,
|
||||
**session 3**, **session 2**, then **session 1** "Status (post-
|
||||
execution)" sub-sections. The Tier-3 list (search for "## Tier 3")
|
||||
is the candidate pool for further reframes.
|
||||
— tier classification + status section. Read **session 9**,
|
||||
**session 8**, **session 7**, **session 6**, **session 5**,
|
||||
**session 4**, **session 3**, **session 2**, then **session 1**
|
||||
"Status (post-execution)" sub-sections. The Tier-3 list (search for
|
||||
"## Tier 3") is the candidate pool for further reframes.
|
||||
- [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
|
||||
— runner conventions, the now-69-spec inventory, primitives in
|
||||
— runner conventions, the now-70-spec inventory, primitives in
|
||||
`lib/`, isolation defaults, the CDP-gate workaround, the eipc
|
||||
note (now updated to cover both registry walk and renderer-wrapper
|
||||
invocation).
|
||||
note (now updated to cover registry walk, renderer-wrapper
|
||||
invocation, AND the schema-rev pattern from session 9).
|
||||
- [`docs/testing/cases/README.md`](cases/README.md) — case-doc
|
||||
structure and the four anchor scopes.
|
||||
- [`tools/test-harness/src/lib/`](../../tools/test-harness/src/lib/)
|
||||
— the existing primitives. Notable session 8 addition: `eipc.ts`
|
||||
gained `invokeEipcChannel` (renderer-side wrapper invocation).
|
||||
— the existing primitives. No session 9 additions; surface remains
|
||||
the session 8 shape (`getEipcChannels` / `findEipcChannel` /
|
||||
`findEipcChannels` / `waitForEipcChannel` / `waitForEipcChannels` /
|
||||
`invokeEipcChannel` on `lib/eipc.ts`).
|
||||
- [`tools/test-harness/eipc-registry-probe.ts`](../../tools/test-harness/eipc-registry-probe.ts)
|
||||
— the session 7 read-only registry probe. Re-run against a
|
||||
debugger-attached Claude (`Developer → Enable Main Process
|
||||
Debugger` from the menu) to capture the current registry shape.
|
||||
- [`tools/test-harness/src/runners/`](../../tools/test-harness/src/runners/)
|
||||
— every existing spec is a template. Notable session 8 templates:
|
||||
- `T35b_mcp_config_runtime.spec.ts` — single-channel
|
||||
`waitForEipcChannel` + `invokeEipcChannel` shape with shape-
|
||||
describing diagnostic. Pattern for any future Tier 2 invocation
|
||||
asserting response is a non-array object.
|
||||
- `T37b_global_memory_runtime.spec.ts` — `string | null` assertion
|
||||
shape. Pattern for invocation probes whose response can hold
|
||||
user content (T37b never logs the body — only type + length —
|
||||
because account memory may be sensitive).
|
||||
- `T27_scheduled_tasks_runtime.spec.ts` — multi-suffix
|
||||
— every existing spec is a template. Notable session 9 template:
|
||||
- `T33c_plugin_browser_invocation.spec.ts` — multi-suffix
|
||||
`waitForEipcChannels` + per-suffix `invokeEipcChannel` loop with
|
||||
aggregated diagnostic attachment. Pattern for parallel-scope
|
||||
assertions (Cowork vs CCD).
|
||||
`args = [[]]` for both methods, `Array.isArray` shape assertion,
|
||||
180s budget for CLI-spawn worst case. Pattern for any Tier 2
|
||||
invocation upgrade where the validator requires a positional arg
|
||||
AND the impl may shell out to a subprocess.
|
||||
- [`docs/testing/cases/*.md`](cases/) — the spec each runner
|
||||
asserts. The **Code anchors:** field tells you exactly where
|
||||
upstream implements the feature.
|
||||
@@ -140,145 +121,151 @@ Read these in order before fanning out:
|
||||
### Tests in scope this session
|
||||
|
||||
**Realistic ceiling: ~2 new specs OR one investigation + one new
|
||||
spec landing.** Session 8 was at the upper end (3 specs + 1 primitive
|
||||
extension) because all three specs were near-identical shape and the
|
||||
primitive extension was small. This session's main bet involves
|
||||
reverse-engineering an arg schema (T33 Phase 2's
|
||||
`egressAllowedDomains`), which is variable in cost.
|
||||
spec landing.** Session 9 was at the lower end (1 spec + 0 primitives)
|
||||
because the schema-rev was the work; now that the validator pattern
|
||||
is documented, follow-on Tier 2 invocation upgrades that need similar
|
||||
schema work are cheaper. Session 8's upper end (3 specs + 1 primitive
|
||||
extension) was a near-identical-shape batch; session 10's main bet
|
||||
should aim for the lower-middle (2 specs OR 1 investigation + 1 spec).
|
||||
|
||||
**Category A (T33 Phase 2 invocation upgrade) is the cleanest
|
||||
single-session win available.** T33 ships as a Tier 1 fingerprint and
|
||||
T33b ships as a Tier 2 handler-registration probe (session 7); T33
|
||||
Phase 2 invokes the same handlers and asserts the response shape —
|
||||
the natural next rung. The blocker is arg validation:
|
||||
`listMarketplaces` failed during session 8's smoke test on a missing
|
||||
`egressAllowedDomains` arg, and the schema lives inside the main
|
||||
handler's validator (not the renderer wrapper). Investigation phase
|
||||
needs to either (a) reverse-engineer the schema from the bundle, or
|
||||
(b) capture a real renderer call's args via DevTools network panel /
|
||||
mainView.js inspection, or (c) drop to `listAvailablePlugins` if its
|
||||
schema is simpler.
|
||||
**Category B remains the natural next step** but its case-doc anchors
|
||||
point at write-side handlers — needs read-side reframes before
|
||||
shipping. Category C (operon / Launch exposure-vs-registration) is
|
||||
still on the table from session 8 and is the smallest-scope option.
|
||||
|
||||
Three categories — pick ONE as the main bet, treat the others as
|
||||
fallback if the main bet hits an early blocker:
|
||||
|
||||
| # | Tests | Source | Notes |
|
||||
|---|---|---|---|
|
||||
| **A** T33 Phase 2 (plugin browser invocation) | T33 Phase 2 (`listMarketplaces` + `listAvailablePlugins` invocation) | `T33b` template + `lib/eipc.ts` invokeEipcChannel | Investigate `egressAllowedDomains` schema first. Risk: schema may be deeply structured (origin allow-list, fetch timeout, etc.) requiring trial-and-error against bundle. If both methods turn up empty after schema-rev, ship a documentation-only `H07_plugin_browser_args_finding.spec.ts` capturing the dead-end. |
|
||||
| **B** T19/T20/T21 Code-tab cluster | T19, T20, T21 | invokeEipcChannel + AX-tree click chains | T19 (integrated terminal) needs `LocalSessions/startShellPty`; T20 (file pane) needs `LocalSessions/writeSessionFile`; T21 (dev server preview) needs `claude.web/Launch/*`. Each combines invocation + AX click — bigger work per spec. T19/T20 anchors verified session 5 (Code-tab AX surface); T21 anchors not yet verified. |
|
||||
| **A** T19/T20 read-side reframes | T19, T20 (read-side) | T33c template + `lib/eipc.ts` invokeEipcChannel | Case-doc anchors are write-side (`startShellPty`, `writeSessionFile`). Investigate read-side equivalents in the registry first — `LocalSessions_$_listSessions`, `LocalSessions_$_getSessionInfo`, `LocalSessions_$_readSessionFile` are candidates per session 7's per-interface map (117 LocalSessions handlers total). Schema-rev each before invocation — the validator-rejection-grep pattern from session 9 applies. Risk: read-side handlers may not have one-to-one case-doc anchor mapping; the spec body has to motivate why the read-side reframe asserts the same surface as the write-side case-doc claim. |
|
||||
| **B** Launch scope + T21 | T21 (dev server preview) | exposure-vs-registration probe + new spec | Confirm `claude.web/Launch/*` handlers register on the claude.ai webContents at all (session 7 mapped 53 distinct interfaces; Launch wasn't in the surfaced list — could be lazy-register on `.claude/launch.json` presence, or exposed-but-not-registered). If registered, ship a Tier 2 reframe similar to T33c. If wrapper-only, document and pivot. |
|
||||
| **C** operon scope exposure-vs-registration probe | n/a (investigation) | new probe, possibly small Tier 1 reframe | Confirm whether operon handlers register on claude.ai webContents at any point, or only on operon-mode entry. Outputs: either a Tier 2 reframe of an operon case-doc test, OR a deferral note explaining why operon scope can't be reached without an operon-mode session. Smaller scope than A or B. |
|
||||
|
||||
#### Category A — T33 Phase 2 invocation upgrade
|
||||
#### Category A — T19/T20 read-side reframes
|
||||
|
||||
The plan: extend the existing T33 / T33b coverage with invocation
|
||||
probes. T33 the Tier 1 fingerprint asserts the bundle contains the
|
||||
two channel name strings; T33b the Tier 2 handler-registration probe
|
||||
asserts both are registered on the claude.ai webContents at runtime;
|
||||
T33 Phase 2 would invoke each and assert the response shape.
|
||||
The plan: pick read-side `LocalSessions_$_*` getters that map to the
|
||||
write-side case-doc claims for T19 (integrated terminal) and T20
|
||||
(file pane), then ship Tier 2 invocation runners against each.
|
||||
|
||||
**Investigation phase first** — invocation has known unknowns:
|
||||
**Investigation phase first** — case-doc anchors are write-side,
|
||||
need read-side equivalents:
|
||||
|
||||
1. **`egressAllowedDomains` schema.** Session 8's smoke test against
|
||||
`CustomPlugins/listMarketplaces` failed with `Argument
|
||||
"egressAllowedDomains" at position 0 ... failed to pass
|
||||
validation`. The schema lives inside the main handler's validator
|
||||
(probably a Zod-style schema that wraps the impl). Approaches:
|
||||
- Grep the bundled minified `index.js` for
|
||||
`egressAllowedDomains` — should resolve to a schema-construction
|
||||
site near the `listMarketplaces` handler. The schema usually has
|
||||
an enumerable shape (object with field validators).
|
||||
- Reverse-engineer the validator's `.toString()` at runtime via
|
||||
`evalInMain` — pull out the handler closure source and look for
|
||||
a `z.object({...})` or similar.
|
||||
- Capture a real renderer call's args via DevTools (open the
|
||||
plugin browser in claude.ai while DevTools network panel is open
|
||||
and the Main Process Debugger is attached; the `ipcRenderer.invoke`
|
||||
args show in the inspector).
|
||||
- Drop to `listAvailablePlugins` first if its schema is simpler;
|
||||
the case-doc anchors both, so either is a valid Phase 2 target.
|
||||
2. **Response shape validation.** `listMarketplaces` returns a list of
|
||||
marketplace metadata objects. Case-doc anchor (`T33` in
|
||||
`docs/testing/cases/extensibility.md`) describes "browser populate
|
||||
flow" — should return a non-empty array on a configured-host run,
|
||||
or empty array on a fresh install. Either way, `Array.isArray` is
|
||||
the strongest assertion that doesn't depend on host state.
|
||||
3. **Side-effect risk.** Both `list*` handlers should be read-only;
|
||||
confirm by reading the handler source (search for `listMarketplaces`
|
||||
in bundled `index.js`).
|
||||
1. **Re-run `eipc-registry-probe.ts`** against a debugger-attached
|
||||
Claude. Filter for `LocalSessions_$_*` and look for `list*` /
|
||||
`get*` / `read*` patterns. Session 7 catalogued 117 LocalSessions
|
||||
handlers but only listed sample method names (4 per interface).
|
||||
The full list lives in `/tmp/eipc-registry-probe.json` from a
|
||||
re-run. Candidate read-sides:
|
||||
- For T19 (terminal): `LocalSessions_$_listSessions`,
|
||||
`LocalSessions_$_getSessionInfo`, `LocalSessions_$_readPty`?
|
||||
- For T20 (file pane): `LocalSessions_$_readSessionFile`,
|
||||
`LocalSessions_$_listSessionFiles`?
|
||||
2. **Schema-rev each candidate** using the session 9 pattern:
|
||||
- First call: smoke test against the user's debugger-attached
|
||||
Claude with `args = []`. Capture the rejection error.
|
||||
- If rejection error includes `Argument "<name>" at position N
|
||||
... failed to pass validation`, grep the bundle for the literal
|
||||
rejection string to find the validator block.
|
||||
- If the call succeeds with `[]`, you don't need schema-rev — go
|
||||
straight to runner shape.
|
||||
- If the call succeeds but returns a non-array shape, decide
|
||||
whether the assertion shape is `Array.isArray` (T33c, T27) or
|
||||
`non-array object` (T35b) or `string | null` (T37b) based on
|
||||
what the read-side returns.
|
||||
3. **Motivate the reframe** in the leading comment. T19's case-doc
|
||||
claim is "integrated terminal opens"; the read-side reframe is
|
||||
e.g. "the per-session listing handler is wired and returns an
|
||||
array — the terminal-spawn path consumes this list to attach to
|
||||
an existing PTY". The connection to the case-doc surface needs
|
||||
to be plausible, not just "this handler returns an array".
|
||||
|
||||
**Approaches to investigate (in order):**
|
||||
|
||||
1. **Bundle grep for the schema construction site.** Cheapest signal.
|
||||
The validator literal is usually inline near the handler.
|
||||
2. **Runtime closure inspection** — pull the handler's `.toString()`
|
||||
via `evalInMain` and look for the schema declaration. May surface
|
||||
a closure-local schema we can't reach, but worth the 5-minute try.
|
||||
3. **DevTools args capture** — last resort because it requires user
|
||||
interaction (open plugin browser in running Claude). Valuable if
|
||||
the schema is fully closure-local and not introspectable from main.
|
||||
1. **Re-run the registry probe against the user's running
|
||||
debugger-attached Claude.** Cheapest signal — captures the full
|
||||
`LocalSessions_$_*` method list as seen from main. Mirror the
|
||||
existing probe; don't rewrite.
|
||||
2. **Smoke-test candidate read-side suffixes** with `args = []`.
|
||||
Capture rejections. The validator-rejection grep pattern from
|
||||
session 9 resolves the schema cheaply.
|
||||
3. **If a candidate's invocation succeeds**, draft a Tier 2 spec
|
||||
using T33c's shape (multi-suffix if both T19 and T20 read-sides
|
||||
land, single-suffix otherwise).
|
||||
|
||||
If Category A turns up empty after 2-3 distinct approaches, STOP AND
|
||||
REPORT. Don't keep digging — document what was tried, ship a
|
||||
"H07 documentation runner" if useful state surfaced, and pivot to
|
||||
Category C (smaller scope) or pause for user review.
|
||||
If Category A turns up empty after 2-3 distinct read-side candidates
|
||||
(none invoke cleanly with `args = []`, all require schema-rev that
|
||||
exceeds the session budget, OR the registry walk doesn't surface a
|
||||
plausible read-side equivalent), STOP AND REPORT. Don't keep
|
||||
digging — pivot to Category B or C.
|
||||
|
||||
If `listMarketplaces` invocation lands cleanly, batch
|
||||
`listAvailablePlugins` invocation as a sibling spec (or fold both
|
||||
into a single `T33c` runner if the case-doc anchor structure makes
|
||||
that natural). Cap at ~2 spec upgrades — don't try to land both if
|
||||
the first one surfaces an unexpected issue.
|
||||
#### Category B — Launch scope + T21
|
||||
|
||||
#### Category B — T19/T20/T21 Code-tab cluster
|
||||
The plan: confirm `claude.web/Launch/*` handlers register on the
|
||||
claude.ai webContents (session 7's per-interface map didn't list
|
||||
them; either lazy-register on `.claude/launch.json` presence, or
|
||||
exposed-but-not-registered).
|
||||
|
||||
Each needs both invocation against `claude.web/ClaudeCode/*` AND
|
||||
AX-tree click chains against rendered Code-tab surfaces. Session 5
|
||||
verified the Code-tab session-opener AX anchors (top-tab Code button,
|
||||
sidebar entries, recents items) but didn't ship a primitive — the
|
||||
anchors are in the plan-doc, ready for a consumer.
|
||||
1. **Re-run `eipc-registry-probe.ts`** filtering for
|
||||
`claude.web_$_Launch_$_*`. If non-empty, treat similarly to
|
||||
T33c: pick a read-side getter, schema-rev with the rejection-
|
||||
grep pattern, ship a Tier 2 invocation runner.
|
||||
2. **If empty**, navigate the running Claude to a project with
|
||||
`.claude/launch.json` and re-run. If still empty, document
|
||||
"Launch scope handlers register lazily on a path we can't
|
||||
construct from the harness" and defer.
|
||||
3. **If wrapper-exposed without registry-side handlers**,
|
||||
document as a known limitation alongside the operon finding.
|
||||
|
||||
T19 (integrated terminal) needs `LocalSessions_$_startShellPty` shape;
|
||||
T20 (file pane) needs `LocalSessions_$_writeSessionFile`; T21 (dev
|
||||
server preview) needs `claude.web/Launch/*` (Launch interface
|
||||
specifically).
|
||||
T21's case-doc claim is "dev server preview pane" — needs
|
||||
`.claude/launch.json` AND a real project to fully exercise. The
|
||||
Tier 2 reframe is "the Launch dispatch handler is registered AND
|
||||
returns the documented shape on a known fixture path"; needs more
|
||||
investigation than T19/T20.
|
||||
|
||||
Skip this category unless Category A's schema-rev turns up empty AND
|
||||
the cluster's `claude.web/Launch/*` AX-tree anchors are pre-verified
|
||||
(they aren't yet — T21 would need a debugger-on probe like session 5
|
||||
used for Code-tab anchors). T19 and T20 are more reachable than T21
|
||||
because their handlers are on `LocalSessions` (already-catalogued by
|
||||
session 7's registry walk).
|
||||
Skip this category unless Category A's read-side candidates turn
|
||||
up empty AND Category C is also unappealing.
|
||||
|
||||
#### Category C — operon scope exposure-vs-registration probe
|
||||
#### Category C — operon / Launch exposure-vs-registration probe
|
||||
|
||||
The plan: write a small read-only probe (mirror
|
||||
`eipc-registry-probe.ts`'s shape) that asks two questions:
|
||||
`eipc-registry-probe.ts`'s shape) that asks two questions for each
|
||||
of operon and Launch:
|
||||
|
||||
1. **At fresh launch + post-login**, are any operon handlers
|
||||
1. **At fresh launch + post-login**, are any operon / Launch handlers
|
||||
registered on the claude.ai webContents? Session 7's registry walk
|
||||
on `/epitaxy` and `/new` saw zero. Confirm — re-run the registry
|
||||
walker, filter by `scope === 'claude.operon'`, capture the count.
|
||||
2. **After navigating to an operon-mode URL** (whatever the URL shape
|
||||
is — TBD; check `claude.ai/...` paths in the bundle for
|
||||
`operon`-keyed routes), do operon handlers appear?
|
||||
on `/epitaxy` and `/new` didn't surface them in the per-interface
|
||||
summary (which lists every `(scope, iface)` pair). Confirm — re-
|
||||
run the registry walker, filter by `scope === 'claude.operon'` and
|
||||
`scope === 'claude.web' && iface.startsWith('Launch')`, capture
|
||||
the count.
|
||||
2. **After navigating to operon-mode URL or a launch-config'd
|
||||
project**, do the missing handlers appear? Operon-mode URLs are
|
||||
TBD — search `claude.ai/...` paths in the bundle for `operon`-
|
||||
keyed routes. Launch-config navigation needs `.claude/launch.json`
|
||||
in the working folder.
|
||||
3. **Independently**, does the renderer-side wrapper expose
|
||||
`window['claude.operon']` regardless of registration status? (Yes
|
||||
per session 8 — confirm this is stable across navigation.)
|
||||
`window['claude.operon']` / `window['claude.web'].Launch`
|
||||
regardless of registration status? (Yes per session 8 for
|
||||
operon; un-confirmed for Launch.)
|
||||
|
||||
Outputs:
|
||||
|
||||
- If operon handlers register on claude.ai eagerly: write a one-liner
|
||||
Tier 2 reframe spec for the highest-priority operon case-doc target
|
||||
(search `docs/testing/cases/` for any test mentioning operon).
|
||||
- If they register lazily on operon-mode entry: document the
|
||||
prerequisite in plan-doc Status section as a Tier 3 item ("requires
|
||||
operon-mode navigation primitive"), and don't ship a probe.
|
||||
- If operon / Launch handlers register on claude.ai eagerly: write a
|
||||
one-liner Tier 2 reframe spec for the highest-priority case-doc
|
||||
target.
|
||||
- If they register lazily on a navigation we can't easily construct:
|
||||
document the prerequisite in plan-doc Status section as a Tier 3
|
||||
item ("requires operon-mode navigation primitive" /
|
||||
"requires .claude/launch.json fixture"), and don't ship a probe.
|
||||
- If the wrapper is exposed without registered handlers: document as
|
||||
a known limitation of `invokeEipcChannel` (will fail with "no
|
||||
handler registered" even though `window['claude.operon']` is
|
||||
handler registered" even though `window['claude.<scope>']` is
|
||||
present).
|
||||
|
||||
This is a smaller-scope category — investigation + maybe one spec
|
||||
landing. Best fallback if Category A's schema-rev turns up empty.
|
||||
landing. Best fallback if Category A's read-side candidates turn up
|
||||
empty.
|
||||
|
||||
#### Cross-compositor focus-shifter expansion (NOT recommended this session)
|
||||
|
||||
@@ -297,12 +284,12 @@ section. Don't add it speculatively — wait for a real consumer.
|
||||
|
||||
### Constraints to respect (don't violate)
|
||||
|
||||
These are unchanged from sessions 1-8 and still load-bearing:
|
||||
These are unchanged from sessions 1-9 and still load-bearing:
|
||||
|
||||
- **Default isolation** unless the spec needs otherwise. Use
|
||||
`seedFromHost: true` for any test that depends on authenticated
|
||||
renderer state — never assume default isolation gets past
|
||||
`/login`. T16/T26/T22b/T27/T31b/T33b/T35b/T37b/T38b are the
|
||||
`/login`. T16/T26/T22b/T27/T31b/T33b/T33c/T35b/T37b/T38b are the
|
||||
templates.
|
||||
- **eipc handlers register on `webContents.ipc._invokeHandlers`,
|
||||
NOT global `ipcMain._invokeHandlers`.** Session 7 finding. Use
|
||||
@@ -317,6 +304,14 @@ These are unchanged from sessions 1-8 and still load-bearing:
|
||||
gate honestly. Main-side direct calls work but require spoofing
|
||||
`senderFrame.url`; reserved as a fallback for non-claude.ai
|
||||
webContents (no current consumer).
|
||||
- **For arg validator schema-rev: grep the rejection message
|
||||
literal first.** Session 9 finding. When `invokeEipcChannel`
|
||||
rejects with `Argument "<name>" at position N ... failed to pass
|
||||
validation`, that exact string lives inline in the validator
|
||||
block. One grep on the literal resolves the location; reading
|
||||
~2KB around it surfaces the full schema. Cheaper than runtime
|
||||
closure inspection in most cases (closure inspection is a good
|
||||
cross-check).
|
||||
- **`lib/input.ts` is X11-only.** Strict `XDG_SESSION_TYPE ===
|
||||
'x11'` gate. Wayland consumers must skip — don't try to bolt
|
||||
Wayland into the file.
|
||||
@@ -330,8 +325,10 @@ These are unchanged from sessions 1-8 and still load-bearing:
|
||||
- **Code-tab AX anchors stay in plan-doc until a consumer needs
|
||||
them.** Don't preemptively add `CodeTab.activateTopTab()` to
|
||||
`claudeai.ts` — session 5's anchors block out the work for
|
||||
whenever a future consumer surfaces. T19/T20 (Category B) would
|
||||
be that consumer.
|
||||
whenever a future consumer surfaces. T19/T20 read-side reframes
|
||||
may need them; pre-flight check before adding to `claudeai.ts`
|
||||
whether the read-side path actually exercises the AX surface or
|
||||
only the IPC.
|
||||
- **CDP auth gate is alive** — runtime SIGUSR1 attach via
|
||||
`app.attachInspector()`, never Playwright's `_electron.launch()`
|
||||
or `chromium.connectOverCDP()`.
|
||||
@@ -347,8 +344,8 @@ These are unchanged from sessions 1-8 and still load-bearing:
|
||||
errors and short-circuit; see S11 / S14 for the pattern.)
|
||||
- **Diagnostics on every run.** `testInfo.attach()` the artefacts.
|
||||
Single-shot JSON dumps for multi-state tests (S11, S14, S31,
|
||||
T22b, T27, T31b, T33b, T35b, T37b, T38b pattern) are cleaner
|
||||
than 5+ separate attachments.
|
||||
T22b, T27, T31b, T33b, T33c, T35b, T37b, T38b pattern) are
|
||||
cleaner than 5+ separate attachments.
|
||||
- **Tag with annotations.** `severity:` and `surface:` on every
|
||||
test so JUnit carries them through to matrix-regen.
|
||||
- **Tabs in TS, ~80-char wrap as the existing files do.** Match
|
||||
@@ -370,22 +367,27 @@ These are unchanged from sessions 1-8 and still load-bearing:
|
||||
invocation that returns user-account-scoped content. Memory bodies
|
||||
may contain personal or sensitive content; MCP server tokens may
|
||||
contain credentials; scheduled-task instructions may reference
|
||||
internal projects.
|
||||
internal projects; marketplace `pluginContext`-filtered listings
|
||||
may surface internal-org marketplace pointers (T33c's defensive
|
||||
default).
|
||||
|
||||
### Phases
|
||||
|
||||
#### Phase 0 — calibration
|
||||
|
||||
1. `cd tools/test-harness && npm run typecheck` — should pass.
|
||||
2. Read the plan doc's "Status (post-execution)" session 8 section,
|
||||
2. Read the plan doc's "Status (post-execution)" session 9 section,
|
||||
then read `lib/eipc.ts`'s `invokeEipcChannel` API +
|
||||
`T35b_mcp_config_runtime.spec.ts` leading comments. Confirm you
|
||||
understand the renderer-wrapper path vs main-side fallback.
|
||||
`T33c_plugin_browser_invocation.spec.ts` leading comments.
|
||||
Confirm you understand the multi-suffix invocation pattern, the
|
||||
schema-rev approach (rejection-message grep), and the 180s
|
||||
timeout budget.
|
||||
3. Pick ONE Category as the main bet. For Category A, plan the
|
||||
approach: (a) bundle grep for the schema, (b) runtime closure
|
||||
inspection, (c) DevTools args capture. List which approaches
|
||||
you'll try in what order, with the cap at 2-3 distinct approaches
|
||||
before STOP AND REPORT.
|
||||
approach: (a) re-run the registry probe to enumerate
|
||||
LocalSessions read-sides, (b) smoke-test candidates with `args =
|
||||
[]`, (c) schema-rev any rejections via bundle grep. List which
|
||||
approaches you'll try in what order, with the cap at 2-3
|
||||
distinct approaches before STOP AND REPORT.
|
||||
|
||||
If Phase 0 surfaces a problem (typecheck failing, primitives unclear,
|
||||
the chosen Category's prerequisites don't hold), stop and report.
|
||||
@@ -393,25 +395,29 @@ Don't fan out.
|
||||
|
||||
#### Phase 1 — fan-out batch
|
||||
|
||||
For Category A (T33 Phase 2 invocation):
|
||||
- Spawn ONE subagent per investigation approach — bundle grep,
|
||||
runtime closure inspection, DevTools args capture (if needed).
|
||||
Treat as exploratory; report findings before committing to a spec
|
||||
shape. The user's debugger-attached running Claude is a great
|
||||
target for verification probes (mirror session 7's
|
||||
`eipc-registry-probe.ts` shape).
|
||||
For Category A (T19/T20 read-side reframes):
|
||||
- Spawn ONE subagent per read-side candidate (or one per
|
||||
investigation approach if candidates aren't yet identified):
|
||||
registry-probe re-run, smoke-test of candidate suffixes, schema-
|
||||
rev of any rejections. Treat as exploratory; report findings
|
||||
before committing to a spec shape. The user's debugger-attached
|
||||
running Claude is a great target for verification probes (mirror
|
||||
session 7's `eipc-registry-probe.ts` shape and session 9's
|
||||
bundle-grep pattern).
|
||||
- Cap re-spawns at 2-3 distinct approaches; if all empty, STOP AND
|
||||
REPORT. Ship an `H07_plugin_browser_args_finding.spec.ts`
|
||||
documentation runner if useful state surfaces during the
|
||||
investigation.
|
||||
- If the schema is recoverable, second batch: ship `T33c` (or
|
||||
whatever the b-vs-c suffix convention dictates) invoking
|
||||
`listMarketplaces` and asserting array shape. Third batch (only
|
||||
if first lands clean): ship the `listAvailablePlugins` sibling.
|
||||
- Cap at ~2 specs total upgrade — don't try to land both if the
|
||||
first one surfaces an unexpected issue.
|
||||
REPORT. Pivot to Category B or C if budget remains.
|
||||
- If a candidate's schema is recoverable AND invocation lands
|
||||
cleanly with valid args, second batch: ship `T19c` /
|
||||
`T20c` (or whatever the file-naming convention dictates given
|
||||
the existing T19/T20 files don't exist yet — use `_runtime`
|
||||
suffix as session 7 did for T22b / T31b / T33b / T38b, OR
|
||||
`_invocation` suffix as session 9 did for T33c, depending on
|
||||
whether registration siblings T19b / T20b are also being
|
||||
shipped).
|
||||
- Cap at ~2 specs total — don't try to land both if the first one
|
||||
surfaces an unexpected issue.
|
||||
|
||||
For Category C (operon scope probe):
|
||||
For Category C (operon / Launch scope probe):
|
||||
- Single subagent writes the registry-walk probe modeled on
|
||||
`eipc-registry-probe.ts`. User runs it (or you run via the
|
||||
debugger if attached). Report findings; if a Tier 2 reframe is
|
||||
@@ -453,7 +459,7 @@ If the target isn't reasonable to implement (anchors don't resolve
|
||||
to anything assertable, the test depends on state you can't
|
||||
construct, the existing primitives don't cover the surface), DO
|
||||
NOT write a stub. Report under Open questions and stop. Sessions
|
||||
1-8 had cumulative ~15 "stop and report" outcomes that were the
|
||||
1-9 had cumulative ~16 "stop and report" outcomes that were the
|
||||
right call (S20 deferral, T05 reshape, T07 needs seedFromHost,
|
||||
T08 needs setState('close'), S28 reclassification, T38 framing,
|
||||
session-3 eipc-registry finding, T37 fixture-readback deferral,
|
||||
@@ -461,7 +467,8 @@ S14 primitive-gap then primitive-build, T35/T36 Phase 2 deferrals,
|
||||
T18 Tier 1 reframe, T36 Phase 2 reclassification to Tier 3/4,
|
||||
session-6 lib/input-niri.ts shipped untested-on-niri, session-7
|
||||
per-wc IPC scope finding overturning the session-3 closure-local
|
||||
conclusion, session-8 renderer-wrapper-vs-main-side decision).
|
||||
conclusion, session-8 renderer-wrapper-vs-main-side decision,
|
||||
session-9 schema-rev cross-check via dual investigation).
|
||||
|
||||
Report shape (~150 words):
|
||||
## <TARGET> [runner | primitive | investigation]
|
||||
@@ -494,7 +501,7 @@ After fan-out returns:
|
||||
- Primitives landed (with API shape)
|
||||
- Specs deferred (with the per-test rationale)
|
||||
- Specs reclassified (Tier 3 → Tier 2, Tier 2 → Tier 1, etc.)
|
||||
- Updated coverage stat (was 69/76 = 91%, now N/76 = M%)
|
||||
- Updated coverage stat (was 70/76 = 92%, now N/76 = M%)
|
||||
6. Don't commit. The user reviews and commits.
|
||||
7. Rotate this prompt: rewrite
|
||||
`docs/testing/runner-implementation-followup-prompt.md` for
|
||||
@@ -502,7 +509,7 @@ After fan-out returns:
|
||||
|
||||
### Self-correction loop
|
||||
|
||||
Same as sessions 1-8:
|
||||
Same as sessions 1-9:
|
||||
|
||||
1. Subagent typecheck failure → re-spawn with explicit fix
|
||||
instruction.
|
||||
@@ -516,16 +523,17 @@ Same as sessions 1-8:
|
||||
passes because no handlers are registered) → re-examine the
|
||||
assertion shape. The lesson from sessions 3 and 7: verify the
|
||||
assertion is meaningful, not just that it passes.
|
||||
5. **Carry-over from session 5/6/7/8:** If pursuing Category A and
|
||||
the schema-rev / closure-inspection / DevTools approaches turn
|
||||
up empty after 2-3 approaches, STOP. Don't keep digging —
|
||||
document what was tried, ship the H07 documentation runner if
|
||||
it surfaces useful state, move to Category C.
|
||||
6. **NEW for session 9:** If Category A's invocation lands but
|
||||
the response shape doesn't match the case-doc claim (e.g.
|
||||
`listMarketplaces` returns something that isn't an array), re-
|
||||
examine the case-doc anchors before shipping the upgrade — the
|
||||
assertion shape might need adjustment, not the test target.
|
||||
5. **Carry-over from session 5/6/7/8/9:** If pursuing Category A
|
||||
and the read-side candidates turn up empty / require schema-rev
|
||||
that exceeds budget after 2-3 approaches, STOP. Don't keep
|
||||
digging — pivot to Category B or C. Document what was tried.
|
||||
6. **NEW for session 10:** If a Category A read-side reframe
|
||||
surfaces a "registered but uninvocable" pattern (handler is on
|
||||
the registry but the renderer-side wrapper isn't exposed for
|
||||
the relevant scope), that's the same shape as session 8's
|
||||
find_in_page / main_window observation — document it and
|
||||
defer rather than building the main-side fallback
|
||||
speculatively.
|
||||
|
||||
Cap re-spawns at 2 per file. Past that, mark as needing human
|
||||
review and move on.
|
||||
@@ -544,9 +552,9 @@ Stop and write the final report when one of:
|
||||
4. **Session budget hits ~2 new specs OR one new primitive
|
||||
landing.** Stop, synthesize, leave the rest for the next
|
||||
session.
|
||||
5. **Category A approaches all turn up empty after 2-3 distinct
|
||||
attempts.** Document the dead-end as a finding, ship H07 if
|
||||
useful, pivot to Category C if budget remains.
|
||||
5. **Category A read-side candidates all turn up empty after 2-3
|
||||
distinct attempts.** Document the dead-end as a finding, pivot
|
||||
to Category B or C if budget remains.
|
||||
|
||||
### What you should NOT do
|
||||
|
||||
@@ -555,8 +563,8 @@ Stop and write the final report when one of:
|
||||
fallback.
|
||||
- **Don't ship stubs.** If a runner can't actually assert what the
|
||||
spec says, mark it as Tier 3 / blocked / primitive-gap and
|
||||
don't write a placeholder. The cumulative fifteen "stop and
|
||||
report" outcomes from sessions 1-8 were the right call — every
|
||||
don't write a placeholder. The cumulative sixteen "stop and
|
||||
report" outcomes from sessions 1-9 were the right call — every
|
||||
one revealed a real constraint.
|
||||
- **Don't break existing runners.** H01-H05 are the canaries.
|
||||
- **Don't restructure `lib/`** beyond targeted additions.
|
||||
@@ -575,11 +583,13 @@ Stop and write the final report when one of:
|
||||
scope) instead.
|
||||
- **Don't call `invokeEipcChannel` for write-side handlers** —
|
||||
`start*`, `set*`, `write*`, `run*`, `openIn*`, `delete*`,
|
||||
`cancel*`, `reset*`. The primitive doesn't enforce a read-only
|
||||
allowlist; the safety property is that case-doc-anchored
|
||||
suffixes are read-side. If a case-doc anchor mentions a write-
|
||||
side suffix, that's a Tier 3 (real-account-write) test, not a
|
||||
Tier 2 invocation reframe.
|
||||
`cancel*`, `reset*`, `installPlugin`, `enablePlugin`. The
|
||||
primitive doesn't enforce a read-only allowlist; the safety
|
||||
property is that case-doc-anchored suffixes are read-side.
|
||||
Session 9 reframed T33's invocation through the read-side
|
||||
`list*` methods specifically because of this. T19/T20 (Category
|
||||
A) need the same treatment — case-doc anchors at write-side
|
||||
handlers must be reframed through read-side equivalents.
|
||||
- **Don't bolt other compositors into `lib/input-niri.ts`.**
|
||||
Sway / Hyprland / River each get their own per-compositor file
|
||||
if a consumer surfaces.
|
||||
@@ -590,7 +600,8 @@ Stop and write the final report when one of:
|
||||
- **Don't preemptively build `CodeTab.activateTopTab()` /
|
||||
`startNewSession()`.** Session 5 captured the AX anchors but
|
||||
T36 Phase 2 (the only known consumer) was reclassified out.
|
||||
T19/T20 (Category B) would be the legitimate consumer.
|
||||
T19/T20 read-side reframes may not even need them if the
|
||||
read-side path is purely IPC-driven.
|
||||
- **Don't add a main-side `invokeEipcChannel` fallback
|
||||
speculatively.** Build it only if a concrete consumer needs to
|
||||
invoke through a non-claude.ai webContents. Premature primitives
|
||||
@@ -602,13 +613,13 @@ Stop and write the final report when one of:
|
||||
### Final report format
|
||||
|
||||
```markdown
|
||||
## Runner implementation summary (session 9)
|
||||
## Runner implementation summary (session 10)
|
||||
|
||||
- Main-bet category: A | B | C
|
||||
- Specs landed: N
|
||||
- Primitives landed: N
|
||||
- Reclassified mid-flight: N (with reasons)
|
||||
- Coverage: was 69/76 (91%), now <NEW>/76 (<PCT>%)
|
||||
- Coverage: was 70/76 (92%), now <NEW>/76 (<PCT>%)
|
||||
- Typecheck: clean | <errors>
|
||||
- KDE-W test run: <pass/skip/fail counts>
|
||||
|
||||
@@ -616,7 +627,7 @@ Stop and write the final report when one of:
|
||||
|
||||
| Cat | Test ID | File | Assertion shape | Status |
|
||||
|---|---|---|---|---|
|
||||
| A | T33 Phase 2 | T33c_plugin_browser_invocation.spec.ts | … | ✓ pass / skip / fail |
|
||||
| A | T19c | T19c_*.spec.ts | … | ✓ pass / skip / fail |
|
||||
| ... |
|
||||
|
||||
## Notable findings
|
||||
@@ -675,9 +686,18 @@ git diff --stat
|
||||
T38b for end-to-end consumer patterns.
|
||||
- For eipc invocation: `lib/eipc.ts` exports `invokeEipcChannel`
|
||||
(renderer-side wrapper at
|
||||
`window['claude.<scope>'].<Iface>.<method>`). See T35b / T37b
|
||||
/ T27 for end-to-end consumer patterns. Only call read-side
|
||||
suffixes; the primitive doesn't enforce a read-only allowlist.
|
||||
`window['claude.<scope>'].<Iface>.<method>`). See T27 / T33c /
|
||||
T35b / T37b for end-to-end consumer patterns. Only call read-
|
||||
side suffixes; the primitive doesn't enforce a read-only
|
||||
allowlist.
|
||||
- **For arg validator schema-rev (session 9 finding):** when
|
||||
invocation rejects with `Argument "<name>" at position N ...
|
||||
failed to pass validation`, grep the bundled `index.js` for the
|
||||
literal rejection string. The validator block sits ~50-200 chars
|
||||
before that throw. Read ~2KB around it to capture the full
|
||||
schema. See plan-doc session 9 status section for the byte
|
||||
offsets of the two CustomPlugins validators (5013601 / 5018821)
|
||||
as worked examples.
|
||||
- **For asar fingerprints: ALWAYS grep the installed asar
|
||||
first.** Build-reference is beautified; the bundle is
|
||||
minified. Case-doc text may be the user-facing form, not the
|
||||
|
||||
@@ -18,6 +18,117 @@ work begins.
|
||||
|
||||
## Status (post-execution)
|
||||
|
||||
**Shipped session 9 (1 new spec, no primitive change):** T33c (Tier 2
|
||||
runtime invocation upgrade — `seedFromHost` + dual-handler invocation
|
||||
of `claude.web/CustomPlugins/{listMarketplaces, listAvailablePlugins}`
|
||||
through the renderer-side wrapper, asserting array shape on each).
|
||||
T33 (Tier 1 fingerprint, session 3) and T33b (Tier 2 handler-
|
||||
registration, session 7) already covered the bundle-string and
|
||||
registry-presence layers; T33c closes the chain by proving the impls
|
||||
are wired through and return the documented `Array<…>` shape. Coverage
|
||||
moved from 69/76 (91%) to 70/76 (92%). Passes on KDE-W in 39.2s
|
||||
(both impls returned arrays of length 0 on the dev box's host
|
||||
config).
|
||||
|
||||
Session 9 findings + reclassifications:
|
||||
|
||||
- **`CustomPlugins/listMarketplaces` and `listAvailablePlugins` use
|
||||
byte-identical hand-rolled arg validators** (NOT Zod for args; the
|
||||
result validator IS Zod, runs after the impl returns). Validator
|
||||
block at bytes 5013601 / 5018821 in the bundled `index.js`. Args
|
||||
are positional:
|
||||
- `[0] egressAllowedDomains: string[]` — required;
|
||||
`Array.isArray(r) && r.every(a => typeof a === "string")`. Empty
|
||||
array passes.
|
||||
- `[1] pluginContext: { mode: string, ...optional fields } |
|
||||
undefined` — optional. The closed-over `sc(...)` validator
|
||||
requires `mode: string`, with optional `workspacePath?`,
|
||||
`settingsLevel?`, `pluginSource?`, `marketplaceScope?`,
|
||||
`telemetryAttempt?: { attempt, maxAttempts }`.
|
||||
- **Minimal valid arg literal**: `args = [[]]`. Both methods
|
||||
accept this and treat the empty allow-list as the safety
|
||||
property — if the underlying impl is the CLI-shelling variant,
|
||||
the egress allow-list is forwarded as the spawned subprocess's
|
||||
permitted domains, so `[]` blocks any network attempt.
|
||||
- **Two impl variants exist in the bundle.** `A.listMarketplaces`
|
||||
has a CLI-shelling implementation (`runCommand(["plugin",
|
||||
"marketplace", "list", "--json"])` with timeout 30s) AND a native
|
||||
implementation (reads `knownMarketplacesFile` directly). Same for
|
||||
`listAvailablePlugins` (CLI: `["plugin", "list", "--json",
|
||||
"--available"]`, timeout 60s; native: scans `marketplacesDir`).
|
||||
The selection logic isn't called out in the closure source but
|
||||
both variants return the same `Array<…>` shape on success — the
|
||||
T33c assertion (`Array.isArray(result) === true`) holds for either
|
||||
impl. Test budget bumped to 180s to accommodate worst-case
|
||||
sequential CLI timeouts.
|
||||
- **Side-effect profile is acceptable for an automated runner.** No
|
||||
installs, no fs writes to user content, no state mutation. The CLI
|
||||
variant spawns a subprocess that emits log lines and may emit a
|
||||
Sentry capture on subprocess failure (e.g. `claude` CLI missing on
|
||||
PATH); the native variant performs a JSON file read. With the
|
||||
empty allow-list, no network egress from the spawned subprocess.
|
||||
Mirrors the read-only invariant T35b / T37b / T27 already rely on.
|
||||
- **Both schema-rev paths converged independently.** Bundle grep
|
||||
(static analysis of the minified `.vite/build/index.js`) and
|
||||
runtime closure inspection (Function.prototype.toString of the
|
||||
registered handler pulled from `webContents.ipc._invokeHandlers`
|
||||
via the user's debugger-attached running Claude on :9229)
|
||||
produced byte-identical validator literals and the same minimal
|
||||
arg shape. High confidence. Investigation budget: ~3 minutes
|
||||
bundle-grep, ~2.5 minutes runtime-closure (subagent traces in
|
||||
/tmp; cleaned up after the run).
|
||||
- **T33c filename convention follows T33/T33b.** Sessions 7 / 8
|
||||
established `_handler_registered` (Tier 1 fingerprint) /
|
||||
`_handler_runtime` (Tier 2 registration) suffixes for T33's
|
||||
paired runners. T33c the invocation upgrade is
|
||||
`T33c_plugin_browser_invocation.spec.ts` — keeps the case-doc
|
||||
pairing visible in `ls runners/`. Same pattern T35b / T37b /
|
||||
T27 implicitly used (no `_invocation` suffix needed because they
|
||||
ship as the first/only Tier 2 runner against their case-doc; T33c
|
||||
has T33 / T33b siblings to disambiguate against).
|
||||
- **Registry-side T33b assertion is preserved unchanged.** T33c
|
||||
calls `waitForEipcChannels` on the same suffix pair before
|
||||
invoking — surfaces "registered but uninvocable" cleanly if the
|
||||
wrapper-exposure gate flips (registration would still happen on
|
||||
the per-wc registry, only the renderer-side wrapper would be
|
||||
missing). Both T33b and T33c can keep co-existing; T33b is the
|
||||
fast-path Tier 2 sibling for sweeps that don't need the
|
||||
invocation cost.
|
||||
- **Session 8's smoke-test rejection cleanly identified the
|
||||
validator.** The session 8 prompt called out that
|
||||
`CustomPlugins/listMarketplaces` failed with `Argument
|
||||
"egressAllowedDomains" at position 0 ... failed to pass
|
||||
validation`. That error message is verbatim what the inline hand-
|
||||
rolled validator throws — the framed channel name carries through
|
||||
into the renderer-eval error surface, so the path from "invocation
|
||||
rejected by validator" to "exact validator location in the
|
||||
bundle" was a single grep on the literal error string. Worth
|
||||
noting for any future schema-rev session: the validator's own
|
||||
rejection messages are the cheapest grep target.
|
||||
|
||||
Tier 2 → Tier 2 candidates remaining for next session: **T19 / T20
|
||||
Code-tab cluster** (each needs `claude.web/LocalSessions/*`
|
||||
invocation + AX-tree click chains; LocalSessions handlers verified
|
||||
present via T22b / T31b / T38b; AX anchors verified session 5).
|
||||
**T19** (integrated terminal) needs `LocalSessions_$_startShellPty`
|
||||
shape — but that's a write-side suffix (spawns a shell), so the
|
||||
read-side reframe would be e.g. `LocalSessions_$_listSessions` or a
|
||||
similar getter, not the case-doc anchor. **T20** (file pane) needs
|
||||
`LocalSessions_$_writeSessionFile` (also write-side); read-side
|
||||
sibling `LocalSessions_$_readSessionFile` could be the Tier 2 entry
|
||||
point. **T21** (dev server preview) needs `claude.web/Launch/*`
|
||||
which session 7's registry walk did NOT confirm — needs an
|
||||
exposure-vs-registration probe first (mirrors the operon scope
|
||||
finding from session 8). **operon scope exposure-vs-registration
|
||||
probe** is still on the table from session 8 — the session 9 budget
|
||||
went to T33c and didn't touch this. Primitive surface
|
||||
(`lib/electron-mocks.ts`, `lib/input.ts`, `lib/input-niri.ts`,
|
||||
`lib/eipc.ts` with read-and-invoke surfaces) remains broad enough
|
||||
that consumer-driven extensions are the right next move, not
|
||||
fresh primitive builds.
|
||||
|
||||
---
|
||||
|
||||
**Shipped session 8 (3 new specs + 1 primitive extension):** T35b, T37b,
|
||||
T27 (Tier 2 runtime invocations — `seedFromHost` + eipc-handler invoke
|
||||
through the renderer-side wrapper, strictly stronger than the Tier 1
|
||||
|
||||
@@ -7,7 +7,7 @@ architecture, decisions, and rationale.
|
||||
|
||||
## Status
|
||||
|
||||
Sixty-nine specs wired (31 cross-env T-tests, 33 env-specific S-tests,
|
||||
Seventy specs wired (32 cross-env T-tests, 33 env-specific S-tests,
|
||||
5 H-prefix harness self-tests). See
|
||||
[`docs/testing/runner-implementation-plan.md`](../../docs/testing/runner-implementation-plan.md)
|
||||
for the tiered triage of remaining tests and the per-spec rationale
|
||||
@@ -46,6 +46,7 @@ behind tier classification.
|
||||
| [T32](../../docs/testing/cases/code-tab-workflow.md#t32--slash-command-menu) | Bundled `index.js` contains `LocalSessions_$_getSupportedCommands` eipc channel + `slashCommands` schema field | file probe |
|
||||
| [T33](../../docs/testing/cases/extensibility.md#t33--plugin-browser) | Bundled `index.js` contains `CustomPlugins_$_listMarketplaces` and `CustomPlugins_$_listAvailablePlugins` eipc channel names (browser populate flow) | file probe |
|
||||
| [T33b](../../docs/testing/cases/extensibility.md#t33--plugin-browser) | After `seedFromHost` + `userLoaded`, both plugin-browser eipc handlers (`listMarketplaces`, `listAvailablePlugins`) are registered on the claude.ai webContents — load-bearing pair (Tier 2 runtime sibling of T33) | L1 (eipc registry) |
|
||||
| [T33c](../../docs/testing/cases/extensibility.md#t33--plugin-browser) | After `seedFromHost` + `userLoaded`, both plugin-browser eipc handlers (`listMarketplaces`, `listAvailablePlugins`) are callable through the renderer-side wrapper with `args = [[]]` (empty `egressAllowedDomains`), each returning array shape — Tier 2 invocation upgrade of T33b, strictly stronger than registration alone | L1 (eipc invoke) |
|
||||
| [T35](../../docs/testing/cases/extensibility.md#t35--mcp-server-config-picked-up) | Bundled `index.js` contains the four-needle MCP-config separation fingerprint: `claude_desktop_config.json` (chat-tab path), `.claude.json` + `.mcp.json` (Code-tab loaders), `"user","project","local"` (settingSources triple Code-session passes to the agent SDK) — pins per-tab separation without launch | file probe |
|
||||
| [T35b](../../docs/testing/cases/extensibility.md#t35--mcp-server-config-picked-up) | After `seedFromHost` + `userLoaded`, the `claude.settings/MCP/getMcpServersConfig` eipc handler is registered AND callable through the renderer-side wrapper, returning a non-array object (Tier 2 runtime sibling of T35, strictly stronger than the bundle-string fingerprint) | L1 (eipc invoke) |
|
||||
| [T36](../../docs/testing/cases/extensibility.md#t36--hooks-fire) | Bundled `index.js` contains the hooks runtime fingerprint: `hook_started` / `hook_progress` / `hook_response` (single-occurrence Verbose-transcript runtime emits) plus `PreToolUse` / `UserPromptSubmit` registry tokens — pins the runtime hook-fire path the case-doc Verbose-transcript claim hangs on | file probe |
|
||||
@@ -114,11 +115,11 @@ window; `NiriIpcUnavailable` thrown off-Niri; consumed by S14), the
|
||||
against case-doc anchors; consumed by T22b / T31b / T33b / T38b)
|
||||
plus its session 8 invoke surface (`invokeEipcChannel` — calls a
|
||||
registered handler through the renderer-side wrapper at
|
||||
`window['claude.<scope>'].<Iface>.<method>`; consumed by T27 / T35b /
|
||||
T37b) — and the `createIsolation({ seedFromHost: true })` primitive
|
||||
that lets login-required tests run hermetically against a copy of the
|
||||
host's signed-in auth state (T07, T16, T22b, T26, T27, T31b, T33b,
|
||||
T35b, T37b, T38b).
|
||||
`window['claude.<scope>'].<Iface>.<method>`; consumed by T27 / T33c /
|
||||
T35b / T37b) — and the `createIsolation({ seedFromHost: true })`
|
||||
primitive that lets login-required tests run hermetically against a
|
||||
copy of the host's signed-in auth state (T07, T16, T22b, T26, T27,
|
||||
T31b, T33b, T33c, T35b, T37b, T38b).
|
||||
|
||||
Note on eipc channels: the `LocalSessions_$_*` and `CustomPlugins_$_*`
|
||||
channel names referenced in the case-doc Code anchors don't register
|
||||
@@ -133,16 +134,22 @@ across builds at `c0eed8c9-…`); 117 `LocalSessions_*` + 16
|
||||
webContents. T22 / T31 / T33 / T38 ship as Tier 1 fingerprints
|
||||
against the bundled channel-name strings; T22b / T31b / T33b / T38b
|
||||
are the runtime registry-presence siblings (strictly stronger,
|
||||
require `seedFromHost`). T27 / T35b / T37b go one step further —
|
||||
they invoke the resolved handlers through the renderer-side wrapper
|
||||
at `window['claude.<scope>'].<Iface>.<method>`, which `mainView.js`
|
||||
exposes via `contextBridge.exposeInMainWorld` after a top-frame +
|
||||
origin gate (`Qc()`: claude.ai / claude.com / preview.* / localhost).
|
||||
Calling through the wrapper carries an honest `senderFrame` for the
|
||||
inlined `le()` / `Vi()` per-handler origin gate, so the test surface
|
||||
matches real attack surface. See `lib/eipc.ts` for both surfaces, and
|
||||
require `seedFromHost`). T27 / T33c / T35b / T37b go one step
|
||||
further — they invoke the resolved handlers through the renderer-
|
||||
side wrapper at `window['claude.<scope>'].<Iface>.<method>`, which
|
||||
`mainView.js` exposes via `contextBridge.exposeInMainWorld` after a
|
||||
top-frame + origin gate (`Qc()`: claude.ai / claude.com / preview.*
|
||||
/ localhost). Calling through the wrapper carries an honest
|
||||
`senderFrame` for the inlined `le()` / `Vi()` per-handler origin
|
||||
gate, so the test surface matches real attack surface. T33c also
|
||||
demonstrates the schema-rev path: when invocation rejects with
|
||||
`Argument "<name>" at position N ... failed to pass validation`,
|
||||
the verbatim rejection string is the cheapest grep target back to
|
||||
the inline hand-rolled validator block (bundle bytes 5013601 /
|
||||
5018821 for the two CustomPlugins methods). See `lib/eipc.ts` for
|
||||
both surfaces, and
|
||||
[`runner-implementation-plan.md`](../../docs/testing/runner-implementation-plan.md)
|
||||
session 7 / 8 status sections for the findings.
|
||||
session 7 / 8 / 9 status sections for the findings.
|
||||
|
||||
Per-row pass/skip counts depend on which sweep runs against the row;
|
||||
see `runner-implementation-plan.md` for tier classification and
|
||||
|
||||
Reference in New Issue
Block a user