Compare commits

...

65 Commits

Author SHA1 Message Date
aaddrick
9528c25e95 test(harness): fix T10 by driving daemon respawn from a main-side eipc call
T10 was passing on older bundles where the cowork client retried the
VM-service connection on a polling cadence — every retry tick was an
implicit trigger for the patched cooldown-gated auto-launch. Post-
1.5354.0 the client opens a persistent socket at boot (zI/E\$i happy
path → KSt) and routes every subsequent RPC through it, so steady
state has no traffic. After SIGKILL the persistent socket goes dead
but no client code is in flight, so kUe()'s catch branch never enters
and the daemon stays gone.

The case-doc claim is upheld by the production code; the patch is
correctly applied (`_lastSpawn` × 3 in installed asar, `_svcLaunched`
× 0). Only the test's trigger model was stale.

Three changes:

1. Wait for `userLoaded`, not `mainVisible`. The post-kill RPC has to
   land in a webContents whose URL matches `claude.ai`; pre-login
   `/login/...` URLs aren't reachable via that filter.

2. Phase 3 fires a daemon RPC each iteration. The renderer wrapper
   (`window['claude.web'].ClaudeVM.getRunningStatus`) was the obvious
   first try but was unreliable: 29/30 calls threw `Cannot find
   context with specified id` because the dead-daemon state forces a
   renderer re-render that invalidates the cached execution context.
   Switched to invoking the eipc handler from MAIN directly via
   `wc.ipc._invokeHandlers.get(channel)(fakeEvent)` with
   `senderFrame.url = 'https://claude.ai/'`. The handler still goes
   through zI/VsA/kUe, the dead socket still throws, the cooldown
   gate still opens, and the patched fork still fires — just without
   any renderer dependency. Three consecutive runs at 21.0s.

3. Budget bumped 20s → 30s. The 10s cooldown is a hard floor, and the
   daemon needs another second or two to bind the socket; 20s was on
   the edge.

Telemetry now reports `rpcAttempts` / `rpcFailures` /
`globalDaemonPidFinal` (the patched `__coworkDaemonPid` global) so
future regressions can be diagnosed from the failure attachment alone.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-04 07:29:57 -04:00
aaddrick
d12c491470 test(harness): fix S25 by routing require through process.mainModule
Three bare `require('node:fs')` calls inside evalInMain bodies were
failing with `ReferenceError: require is not defined` on the bundled
Electron's main-process CDP eval scope — `require` isn't exposed as a
global there, only on the current module object. Adjacent calls on
the same blocks already used `process.mainModule.require('electron')`
correctly; the `node:fs` lines were the outliers.

Doc comment on lib/inspector.ts:evalInMain captures the gotcha so the
next caller doesn't trip the same wire.

S25 verified: passes in 15.1s on KDE-W (CLAUDE_TEST_USE_HOST_CONFIG=1).

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-04 07:29:35 -04:00
aaddrick
0a1f8071e9 docs(testing): session 16 verify T17 seedFromHost + schema-rev for listRemotePluginsPage / listSkillFiles + flag orchestrator STOP for session 17
Final session of the sessions-13-to-16 autonomous orchestration run.

Verified session 15's T17 seedFromHost migration end-to-end against
the dev box: bare 60s Playwright timeout is GONE, seedFromHost clones
host config, waitForReady('userLoaded') resolves to a post-login URL
(https://claude.ai/epitaxy), dialog mock installs, and the session-14
CodeTab.activate({ timeout: 15_000 }) AX migration succeeds first try.
T17 reaches a NEW failure mode at the next chain step
(openFolderPicker after selectLocal — Select-folder pill doesn't
render on /epitaxy workspace route, likely needs /new context).
Classified as renderer-state-dependent, not openPill / clickMenuItem
loop — ruling out sessions 14-15's parked AX migration hypothesis
once and for all. Deferred for a future session (needs careful /new
navigation primitive).

Schema-rev resolved both deferred validators by bundle inspection of
app.asar (no smoke-test possible — T17's seedFromHost step killed the
debugger-attached leaked isolations as expected):

- CustomPlugins.listRemotePluginsPage(limit: number, offset: number)
- LocalPlugins.listSkillFiles(pluginId: string, skillName: string,
  pluginContext?: opaque)

Neither shipped as a Tier 2 invocation — listRemotePluginsPage is
not anchored in any case doc (T33 anchors listMarketplaces +
listAvailablePlugins, both already covered by T33b/T33c);
listSkillFiles is meaningful only with an installed plugin, which
needs Tier 3 destructive setup explicitly forbidden by the
constraints. Schemas captured in plan-doc as a deferred reframe.

Coverage stays at 74/76 (97%) — verification + investigation, no
spec landed.

Orchestration-level summary (sessions 13-16):
- Coverage start 74/76 (97%) → end 74/76 (97%) — NO net coverage
  gain across 4 sessions
- Net deliverables: 1 primitive (lib/ax.ts, session 13), 1 AX
  migration (activateTab + CodeTab.activate, session 14, fixed T16
  pre-existing-flake), 1 structural fix (T17 seedFromHost, session
  15, verified working session 16), 1 verification + 1 schema-rev
  investigation (session 16)
- Why coverage stalled: structural ceiling reached. Remaining 2
  specs need real claude.ai account write-side state which the
  harness can't construct without violating the Tier 3 destructive
  constraint.

Followup prompt rotated for session 17 with a STOP flag at the top —
session 17 will only run if the user manually triggers another
orchestration AND at least one of four preconditions holds (real
signed-in debugger-attached Claude, real-account write-side fixture,
renderer-drift event, or new IPC surface).

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-04 00:30:52 -04:00
aaddrick
14ccb61596 docs(testing): session 15 plan/inventory + rotate session 16 prompt
Plan-doc Status (post-execution): adds session 15 entry capturing
the T17 structural fix (legacy `CLAUDE_TEST_USE_HOST_CONFIG=1` →
`seedFromHost: true`), the RawElement import prune, the
debugger-attached-to-leaked-test-isolation discovery, the
`openPill` / `clickMenuItem` migration park decision, and the
"productivity signal is dimming — 3 consecutive sessions without
coverage gain" note for the orchestrator.

Followup prompt rotation: rewrites for session 16 with the new
PRIORITY (run T17 to verify the seedFromHost migration), the
upgraded Phase 0 calibration check (port-9229 attachment quality,
not just port status — must distinguish auth-bearing Claude from
leaked /login isolations via `evalInMain` webContents probe), the
narrowed category list (D-verify + C + STOP recommendation), and
the explicit STOP termination criterion if both D-verify and C
turn up empty.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-04 00:23:16 -04:00
aaddrick
af8a60bdb1 test(harness): session 15 migrate T17 to seedFromHost + prune unused RawElement import (no spec, coverage unchanged at 97%)
Session 15 investigation finding: T17's pre-existing 60s timeout
flake (hypothesised in sessions 13-14 to live in `openPill` /
`clickMenuItem` AX polling) was actually structural. The trace
showed a bare 60s Playwright spec timeout with NO `renderer-url`
attachment fired — meaning the test never reached line 49's
attach call, which means it never resolved
`waitForReady('userLoaded')` at line 40.

Root cause: T17 was the last spec on the legacy
`CLAUDE_TEST_USE_HOST_CONFIG=1` / `isolation: null` shape. Every
other auth-required spec (T07, T16, T19, T20, T21, T22b, T26, T27,
T31b, T33b/c, T35b, T37b, T38b) had moved to `seedFromHost: true`.
Without the env var (which CI / orchestration didn't set), T17
fell through to a fresh isolation with no auth, hit `/login`, and
`waitForUserLoaded`'s 90s default budget got preempted by
Playwright's 60s spec timeout (per `playwright.config.ts`).

Migration: rewrite T17 to use the canonical seedFromHost pattern
(mirroring T16 / T26): `createIsolation({ seedFromHost: true })`
with a clean skip path on host-config-unavailable, then
`launchClaude({ isolation })` and `waitForReady('userLoaded')` —
which now resolves cleanly within budget when host has signed-in
auth, or skips with a clear message when it doesn't.

Cleanup: prune unused `RawElement` re-export import from
`lib/claudeai.ts` per session 14's leftover hint (left over from
the migration that didn't end up needing the type re-export).

T17 not run this session because the dev box's running Electron
processes ambiguously include leaked test isolations and possibly
the user's real Claude — `seedFromHost` would kill both, deferred
to next session for verification with explicit user-Claude
disambiguation.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-04 00:23:05 -04:00
aaddrick
8b556f2997 docs(testing): session 14 plan/inventory + rotate session 15 prompt
Add session 14 status entry to runner-implementation-plan.md (call-
site migration + T16 fix verification + T17-stays-flaky verification).
Rotate the followup prompt for session 15: PRIORITY shape is T17
investigation + potential `openPill` / `clickMenuItem` migration if
the failure trace shows AX-polling-reachable cause; A / B / C
unchanged from session 14 (still need debugger).

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-04 00:11:59 -04:00
aaddrick
865c147916 test(harness): session 14 migrate activateTab to waitForAxNode (no spec, coverage unchanged at 97%)
Migrate `activateTab` from a one-shot AX snapshot to a `waitForAxNode`
poll, plus migrate `CodeTab.activate`'s post-click `retryUntil`-around-
`findCompactPills` loop to `waitForAxNodes`. Fixes T16's pre-existing
`no AX-tree button with accessibleName="Code" found` failure mode
documented in session 13 — verified by stashing the migration and re-
running T16 against the baseline (same failure), then restoring and
seeing T16 pass 3/3 in succession against the migrated form.

`activateTab` now takes an optional `{ timeout?: number }` parameter,
defaulting to 5000ms (matches `lib/ax.ts` defaults). `CodeTab.activate`
passes its own timeout (T16 supplies 15s) through to both the pre-
click click-budget and the post-click pill poll. The post-click
predicate is copy-pasted from `findCompactPills` (role: button +
hasPopup: menu + non-empty accessibleName + not a `^More options for `
row trigger) to keep the page-object free-standing.

`findCompactPills` itself stays a one-shot snapshot — it has three
call-sites (the formerly-hand-rolled retry inside `CodeTab.activate`
that this commit migrates, plus T16's failure-diagnostic capture and
post-activate diagnostic that both want fail-fast snapshots). Pushing
retry latency into the helper itself would change the diagnostic
contract.

`openPill` and `clickMenuItem` not migrated this session — their
post-click stability gates plus per-iteration sleep budgets carry
T17-specific tuning that the followup prompt explicitly cautioned
against changing speculatively. T17 stays pre-existing-flaky on KDE-W;
verified that status by stashing the migration and re-running T17
(same 60s timeout — failure unchanged-by-migration).

Verification:
- npm run typecheck: clean
- H01 / H02 / H03 (canaries): pass
- T16: pass 3/3 (migration fixes the documented pre-existing failure)
- T17: still pre-existing-flaky (verified independent of migration)
- T26: pass (regression check — uses snapshotAx directly, not affected)

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-04 00:11:53 -04:00
aaddrick
113329f91f docs(testing): session 13 plan/inventory + rotate session 14 prompt
- runner-implementation-plan.md: session 13 status section
  (lib/ax.ts primitive shipped, no new spec, coverage stays at 74/76
  = 97% since primitive-only sessions don't move the spec count;
  Phase 0 found debugger detached on dev box which blocked Categories
  A/B/C; pivoted to the PRIORITY DOM unification primitive). Updated
  the "Primitive gaps to flag" entry — DOM/AX loading + traversal
  primitive moved from FLAGGED to LANDED with the consumer list and
  the deliberately-deferred shapes (waitForRenderedSurface registry,
  CSS-querySelector primitive).
- README.md: lib/ax.ts entry in the substrate-primitives note;
  session 13 consumer list (claudeai.ts page-objects + T26).
  Spec count unchanged at 74.
- runner-implementation-followup-prompt.md: rotated for session 14.
  Adds new Category D (call-site migration to waitForAxNode for
  flake reduction) as the PRIORITY shape — doesn't need the
  debugger, builds on session 13's primitive. Carries forward
  Categories A / B / C (still need debugger). Phase 0 must check
  port 9229 BEFORE picking a category. Reading order updated:
  session 13 first.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 23:57:00 -04:00
aaddrick
3d47f33ccb test(harness): session 13 lib/ax.ts AX substrate primitive (no spec, coverage unchanged at 97%)
Threshold-driven extraction of the AX-tree loading + traversal
substrate. `claudeai.ts` page-objects and `T26_routines_page_renders`
both carried inline copies of the same `snapshotAx` helper (T26's
even noted "premature abstraction at 1 consumer" — with two consumers
the threshold is met). Plus the user reports recurring AX-query flake.

Surface (`tools/test-harness/src/lib/ax.ts`):

- snapshotAx(inspector, opts) — single AX read with the stability
  gate. opts.fast skips the gate for inside-poll callers (matches
  the existing private-helper contract in claudeai.ts).
- waitForAxNode(inspector, predicate, opts) — repeatedly snapshot
  the tree and return the first matching RawElement, or null on
  timeout. Gates on stability once at the start (configurable),
  then iterates with fast: true. Built against the inline polling
  loops in CodeTab.activate, openPill, clickMenuItem, and T26's
  pre/post-click anchor scans — the existing call-sites are NOT
  migrated this session (per-spec retry budgets are tuned, changing
  them speculatively risks introducing flake).
- waitForAxNodes(inspector, predicate, opts) — same shape, returns
  every match. For consumers that want to enumerate.
- Re-exports: RawElement, AxNode, axTreeToSnapshot,
  waitForAxTreeStable from explore/walker.ts so consumers stay
  inside lib/ instead of reaching into explore/. Walker remains
  the source of truth for AX-snapshot construction; lib/ax.ts is
  the runner-facing alias.

Refactors:

- claudeai.ts swaps its private snapshotAx for the shared one
  (5-line import change; call-sites unchanged).
- T26_routines_page_renders.spec.ts drops its inlined helper and
  imports from lib/ax.ts.

Phase 0 of session 13 found port 9229 detached (Claude was running
but Developer → Enable Main Process Debugger had not been clicked),
which blocked Categories A (operon-mode navigation probe) and C
(schema-rev for listRemotePluginsPage / listSkillFiles) — both need
runtime probing. Category B (Tier 3 read-only reframes) effectively
needed the debugger too. The PRIORITY-flagged DOM unification
primitive was tractable without it (pure static-analysis-driven
extraction), so session 13 pivoted there. Coverage stays at 74/76
(97%) since primitive-only sessions don't move the spec count.

What's NOT in lib/ax.ts:

- waitForRenderedSurface(client, surfaceKey) — the plan-doc proposal
  mentioned a named-surface registry but no consumer asks for it
  today; promote when a third consumer crystallizes with a specific
  surface in mind.
- CSS-querySelector primitive — T07's topbar poll is a different
  abstraction (DOM, not AX). No second consumer signal yet.
- Call-site retry budget changes — the per-spec budgets are tuned;
  speculative changes risk introducing flake. Migration to
  waitForAxNode is a future session's work.

Verification: typecheck clean; H01-H03 canaries pass; T26 passes
(21.1s on KDE-W); T11_runtime spot-check passes. Pre-existing T16 /
T17 / T07 / S25 / S29-S31 flake is unchanged on the baseline (verified
by stashing the session-13 changes and re-running T16).

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 23:56:47 -04:00
aaddrick
a8093a8e11 docs(testing): session 12 plan/inventory + rotate session 13 prompt
- runner-implementation-plan.md: session 12 status section (T11_runtime
  shipped, coverage 96% → 97%, dual-impl-object invocation pattern
  documented, full LocalPlugins/CustomPlugins method inventory). T11
  Tier 1 entry annotated with session-12 sibling reference. New
  "Primitive gaps" entry flagging the unified DOM/AX loading +
  traversal primitive proposal — user reports flake from tests not
  waiting long enough for DOM render; threshold for extraction is
  reached based on the 5+ AX-using specs each rolling their own
  retryUntil budget.
- README.md: T11_runtime row in inventory; eipc note extended with
  the cross-impl-object dual-invocation pattern; spec count 73 → 74.
- runner-implementation-followup-prompt.md: rotated for session 13.
  Carries forward the operon investigation, Tier 3 read-only reframe,
  and schema-rev categories; flags the DOM/AX loading-primitive build
  as the PRIORITY main bet (strictly higher impact than another
  reframe — flake reduction touches every existing AX-using spec).

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 23:20:00 -04:00
aaddrick
23285d3d5a test(harness): session 12 T11 plugin install runtime (1 new spec, 96% → 97% coverage)
Tier 2 reframe of T11 (plugin install — Anthropic & Partners). Sibling
to the existing T11_plugin_install_fingerprint Tier 1 spec; promotes
from "install code path strings are in the bundle" to "install
handlers register at runtime AND read-sides across two impl objects
return the documented array shapes".

Five-suffix registration probe over the install-flow handlers:
- CustomPlugins/installPlugin (case-doc anchor index.js:507181)
- CustomPlugins/uninstallPlugin (lifecycle complement)
- CustomPlugins/updatePlugin (lifecycle complement)
- CustomPlugins/listInstalledPlugins (also invoked)
- LocalPlugins/getPlugins (also invoked)

Plus first cross-impl-object dual invocation:
- CustomPlugins/listInstalledPlugins([[]]) → array (drives Manage
  plugins panel — empty `egressAllowedDomains` per T33c pattern)
- LocalPlugins/getPlugins([]) → array (reads
  ~/.claude/plugins/installed_plugins.json per case-doc :465822)

Strictly stronger than single-interface dual invocation when the
case-doc surface spans two impl objects — proves the install
plumbing crosses both intact. Mixed-arg-shape (one needs [[]],
another []) follows session 11's mixed-shape pattern.

Smoke-test against the user's debugger-attached running Claude
surfaced the full LocalPlugins (15 methods) + CustomPlugins (16
methods) inventory; 9 read-sides invocable cleanly, 2 still-
rejecting candidates flagged for session 13 schema-rev
(listRemotePluginsPage limit, listSkillFiles pluginId+skillName).

Passes on KDE-W in 28.8s (cold). H01-H04 canaries stay clean.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 23:19:49 -04:00
aaddrick
22bd68d5b2 docs(testing): session 11 plan/inventory + rotate session 12 prompt
Plan-doc gets a new "Shipped session 11" status section above
session 10's. Captures the T21 spec landed (commit 3ea677f), the
cwd-validator-is-typeof-string finding, the 30-callable-Launch-
members observation (5 wrapper-only `on*` event subscribers + 2
proxies don't show in `_invokeHandlers`), and the dual case-doc-
anchored read-side invocation pattern (distinct from T19/T20's
foundational-surrogate shape).

README inventory adds T21 row, bumps spec count from 72 to 73 (35
T-tests now).

Followup prompt rotates for session 12 — T11 plugin install
runtime upgrade becomes the main bet (currently a Tier 1
fingerprint; LocalPlugins registers 15 handlers per session 7's
probe). Operon-mode navigation probe stays as the smaller-scope
fallback. Constraints / phases / self-correction loop sections
unchanged from sessions 10-11; the per-session section just
swaps in the new findings.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 23:02:28 -04:00
aaddrick
3ea677f563 test(harness): session 11 T21 dev server preview runtime (1 new spec, 95% → 96% coverage)
Tier 2 reframe of the T21 case-doc claim "dev server preview pane
starts on Preview → Start". First runtime probe for T21 — no
fingerprint sibling shipped (case-doc anchors point at impl-side
function names, not user-facing literals).

Multi-suffix `waitForEipcChannels` over five case-doc-anchored
Launch suffixes (`getConfiguredServices`, `startFromConfig`,
`stopServer`, `getAutoVerify`, `capturePreviewScreenshot`) plus
dual `invokeEipcChannel` on the case-doc-anchored read-side
getters: `getConfiguredServices(cwd)` returns array, `getAutoVerify(cwd)`
returns boolean. cwd validator is `typeof cwd === 'string'` only —
smoke-tested against the debugger-attached running Claude (session
11 finding); empty / relative / non-existent paths all pass, only
null / undefined / object wraps reject.

Different shape from T19 / T20: those use `LocalSessions/getAll` as
a foundational read-side surrogate because their case-doc anchors
are write-side. T21's case-doc anchors include native read-side
handlers, so invocation lands on case-doc-anchored handlers
directly (mirrors T33c's dual-handler pattern). Mixed-shape dual
invocation (one returns array, another returns boolean) is fine —
each shape asserted independently.

Read-only by design — neither `getConfiguredServices` nor
`getAutoVerify` spawns subprocesses, mutates fs, or performs
network egress. cwd is `process.cwd()` (the test process's own
working directory).

Passes on KDE-W in 16.7s (cold) / 5.2s (warm follow-up).

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 23:02:17 -04:00
aaddrick
4c9a2ac951 docs(testing): session 10 plan/inventory + rotate session 11 prompt
- Plan-doc Status: session 10 sub-section (T19/T20 + Launch finding +
  operon partial answer + LocalSessions read-side enumeration).
- README inventory: T19/T20 rows; eipc primitive consumer lists
  (`waitForEipcChannels` and `invokeEipcChannel`) extended with T19/T20.
- Followup-prompt: session 11 candidates — Category A (T21 dev server
  preview, now tractable since Launch registers 25 handlers; needs cwd
  schema-rev), Category B (T11 plugin install runtime upgrade via
  LocalPlugins read-sides), Category C (operon-mode navigation probe).

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 22:40:36 -04:00
aaddrick
cd1ad67f9a test(harness): session 10 T19/T20 runtime probes (2 new specs, 92% → 95% coverage)
T19 (integrated terminal) + T20 (file pane) ship as Tier 2 reframes —
multi-suffix `waitForEipcChannels` over the case-doc-anchored write-side
eipc surfaces (PTY trio + buffer + resize for T19; readSessionFile +
writeSessionFile + pickSessionFile for T20) plus a single
`invokeEipcChannel('LocalSessions_$_getAll', [])` array-shape assertion
as the foundational read-side surrogate.

Both surfaces bind to LocalSessions; getAll proves the LocalSessions
impl object — the same `A` reference all 117 LocalSessions handlers
close over — is reachable through the renderer wrapper. Strictly
stronger than registration alone, since a half-applied refactor where
the registration block runs but the impl object is missing methods
would pass registration-only and fail invocation.

Pass on KDE-W: T19 23.4s, T20 27.7s (~52.7s sequential).

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 22:40:26 -04:00
aaddrick
8dd4a3229c docs(testing): session 9 plan/inventory + rotate session 10 prompt
Plan-doc Status section gains a session 9 block documenting the
schema-rev finding (hand-rolled positional validators on the two
CustomPlugins methods, byte offsets, minimal valid arg literal,
two impl variants), the dual-investigation pattern (bundle grep +
runtime closure inspection converged independently), and the
rejection-message-grep schema-rev shortcut for future sessions.

README inventory bumps to 70 specs, adds the T33c row, threads T33c
through the eipc-invoke consumer list and the seedFromHost
consumer list, and surfaces the validator-rejection-grep pattern
in the eipc note.

Followup-prompt rotated for session 10. Carries over the operon
scope question from session 8 and adds the Launch scope question
from session 9 (both "wrapper-exposed but registry-unconfirmed"
shape — feeds Category C). Promotes T19/T20 read-side reframes to
Category A (case-doc anchors at write-side handlers; read-side
equivalents need to be enumerated from the registry walker first).

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 22:14:19 -04:00
aaddrick
6a3c8319e0 test(harness): session 9 T33c plugin browser invocation (1 new spec, 91% → 92% coverage)
Tier 2 invocation upgrade of T33b — calls both
`claude.web/CustomPlugins/{listMarketplaces, listAvailablePlugins}`
through the renderer-side wrapper at
`window['claude.web'].CustomPlugins.<method>` with `args = [[]]`
(empty `egressAllowedDomains`, omit optional `pluginContext`) and
asserts each response is an array. Strictly stronger than T33b's
registration-only check — proves the impls are wired through and
return the documented shape. Passes on KDE-W in 39.2s.

Schema-rev surfaced byte-identical hand-rolled positional validators
on both methods (bundle bytes 5013601 / 5018821): not Zod for args
(though Zod IS used for the result shape after the impl returns).
Required `string[]` for arg 0; empty array passes. Two impl variants
exist (CLI-shelling subprocess vs native file read); both return the
same array shape. Test budget 180s for worst-case sequential CLI
timeouts.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 22:14:09 -04:00
aaddrick
0bbb54d1b4 docs(testing): session 8 plan/inventory + rotate session 9 prompt
Updates the plan doc's "Status (post-execution)" section with the
session 8 findings:
- eipc invocation tractable via two paths (main-side direct call with
  synthesized event vs renderer-side wrapper); chose renderer-side
  for the primitive because it honors the per-handler origin gate
  honestly.
- mainView.js exposes 9 window['claude.*'] wrapper namespaces, more
  than the registry-side scope count — operon flagged for an
  exposure-vs-registration check before any operon spec lands.
- invokeEipcChannel API shape, T35b/T37b/T27 assertion shapes, and
  the renderer-eval string-error surface documented.
- session 8 prompt's :68820 le() reference flagged as off (le is at
  :5045138 in this build).

Updates README inventory table to add T27, T35b, T37b rows (now
69-spec inventory: 31 cross-env T-tests, 33 env-specific S-tests,
5 H-prefix harness self-tests). Updates the lib/eipc.ts substrate
description to mention invokeEipcChannel and its wrapper-path
explanation.

Rotates the followup prompt for session 9 — main bet is T33 Phase 2
(plugin browser invocation, blocked on egressAllowedDomains schema
reverse-engineering); fallback categories are T19/T20/T21 Code-tab
cluster and the operon scope exposure-vs-registration probe.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 21:50:15 -04:00
aaddrick
7ffd73add1 test(harness): session 8 runners + invokeEipcChannel primitive (3 new specs + 1 primitive extension, 87% → 91% coverage)
Adds three Tier 2 invocation probes — T35b / T37b paired with the
existing T35 / T37 Tier 1 fingerprints (session 4), plus T27 as the
case-doc Tier 2 reframe of "Scheduled task fires and notifies" (no
prior fingerprint sibling, mirrors T26's no-fingerprint shape). All
three call eipc handlers through the renderer-side wrapper at
\`window['claude.<scope>'].<Iface>.<method>\` and assert the
documented response shape:

- T35b — \`claude.settings/MCP/getMcpServersConfig\` returns a
  non-array object (Record<string, MCPServerConfig>).
- T37b — \`claude.web/CoworkMemory/readGlobalMemory\` returns
  \`string | null\`.
- T27 — both \`claude.web/CoworkScheduledTasks\` and
  \`claude.web/CCDScheduledTasks\` \`getAllScheduledTasks\` return
  arrays (parallel-scope assertion: Cowork = chat-side / Routines
  sidebar; CCD = Code-tab).

New \`invokeEipcChannel(inspector, suffix, args?, opts?)\` API on
\`lib/eipc.ts\` resolves the case-doc-anchored suffix through the
existing \`findEipcChannel\` walker, splits the full
\`<scope>_$_<iface>_$_<method>\` suffix to recover the wrapper path,
then calls through \`evalInRenderer('claude.ai',
"window['claude.<scope>'].<Iface>.<method>(...args)")\`. Renderer-
side rather than main-side direct-call because the per-handler
origin gates (\`le()\` / \`Vi()\` / \`mm()\` in the bundle) are
duck-typed structural checks that a fake event passes — but going
through the wrapper carries an honest \`senderFrame\` and aligns
test surface with real attack surface. Main-side direct call stays
available as a fallback for non-claude.ai webContents (no current
consumer).

Three parallel investigation subagents confirmed the gate semantics
empirically — see plan-doc session 8 status section for the
findings, the wrapper-namespace catalogue (9 \`window['claude.*']\`
namespaces), the \`mainView.js:792\`-onwards exposure-gate \`Qc()\`
behavior, and the operon-scope exposure-vs-registration question
flagged for session 9.

All three pass on KDE-W (Plasma 6 Wayland, XWayland) — T27 27.7s,
T35b 33.2s, T37b 25.8s, ~1.5m total sequential. \`npm run
typecheck\` clean.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 21:49:51 -04:00
aaddrick
0daceb1e30 docs(testing): session 7 plan/inventory + rotate session 8 prompt
Documents the session 7 eipc-registry finding and the four T*b runtime
probes:

- Plan-doc Status section gains a session 7 entry covering the
  per-WebContents IPC scope discovery, the cross-route stickiness
  finding, the build-stable framing UUID, the 53-distinct-interface
  map, and the bonus interfaces (CoworkMemory, MCP, CoworkScheduledTasks,
  ClaudeCode) that unlock T35 Phase 2 / T37 Phase 2 / T27 Tier 2
  reframe / T19/T20/T21 cluster for next session.

- README inventory adds T22b/T31b/T33b/T38b rows + lib/eipc.ts to
  the lib/ tree + the substrate paragraph. The trailing "Note on
  eipc channels" gets rewritten to reflect the per-wc finding
  (sessions 2-6 had it wrong; the registry IS reachable, just
  on `webContents.ipc._invokeHandlers` not global ipcMain).

- Session 8 followup prompt rotated. Main bet for session 8: extend
  lib/eipc.ts with `invokeEipcChannel` to unlock T35 Phase 2 as the
  canary, then T37 Phase 2 / T27 reframe if budget. Three approach
  hypotheses pre-listed: renderer-side via evalInRenderer,
  direct main-side handler call with synthesized event, hook the
  dispatcher's invoke-side. Cap at 2-3 attempts before STOP AND
  REPORT (carry-over from session 5/6/7 self-correction loop).

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 20:13:16 -04:00
aaddrick
b9697c2d1e test(harness): session 7 runners + eipc-registry primitive (4 new specs + 1 new primitive, 82% → 87% coverage)
Lands the eipc-registry exposer as Tier 2 runtime probe siblings of
session 3's Tier 1 fingerprints. Sessions 2-6 had marked the eipc
registry as closure-local — session 3 walked globalThis, found it
empty, and concluded the LocalSessions_$_* / CustomPlugins_$_* channels
weren't introspectable from main. Session 7 found the missing piece:
handlers DO go through Electron's stdlib IpcMainImpl, just on the
per-WebContents IPC scope (`webContents.ipc._invokeHandlers`,
Electron 17+) rather than the global ipcMain. Verified empirically
against a debugger-attached Claude — claude.ai webContents holds 490
handlers including all 117 LocalSessions + 16 CustomPlugins; global
ipcMain has the 3 chat-tab MCP-bridge handlers session 3 reported.

New primitive lib/eipc.ts (read-only by design):
- getEipcChannels — walks per-wc registries, filters by scope/iface
- findEipcChannel / findEipcChannels — case-doc-suffix lookup
- waitForEipcChannel / waitForEipcChannels — populate-on-init poll

Opaque on the $eipc_message$_<UUID>_$_ framing prefix (UUID has been
stable at c0eed8c9-… but the primitive doesn't pin it — match by
case-doc-anchored suffix).

Four new Tier 2 runtime probes paired with existing Tier 1 fingerprints
(T14a/T14b convention):
- T22b — LocalSessions_$_getPrChecks (PR monitoring)
- T31b — three-channel side-chat trio (load-bearing as a unit)
- T33b — two-channel plugin browser pair
- T38b — LocalSessions_$_openInEditor (Continue in IDE)

All four require seedFromHost (eipc handlers register on the claude.ai
webContents, which only exists post-login). Strictly stronger than
the bundle-string fingerprints — registry presence proves the upstream
code actually executed `e.ipc.handle(channel, fn)` during init, not
just that the constant is in the bundle.

All four pass on KDE-W (Plasma 6 Wayland, XWayland) — sequential
(workers: 1) at ~7.5s each, ~32s total.

Also adds tools/test-harness/eipc-registry-probe.ts as a re-runnable
read-only probe — connects to a debugger-attached Claude on port
9229, dumps per-wc IPC handler state with per-interface breakdown.
Useful when designing new probes or auditing for upstream drift.
Sibling of probe.ts (renderer-DOM) and grounding-probe.ts
(case-grounding).

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 20:13:00 -04:00
aaddrick
e038768daa docs(testing): session 6 plan/inventory + rotate session 7 prompt
Plan-doc Status (post-execution): session 6 section added at top
covering S14 + lib/input-niri.ts ship + the cross-compositor-files-
not-dispatcher reasoning + Category B (eipc-registry exposer)
carrying over to session 7 unattempted.

Untested-on-real-Niri caveats explicitly documented (Ok-wrapper
schema version, Claude app_id literal value, foot-on-PATH) so the
first Niri-row sweep knows what to confirm without re-deriving the
recon.

README inventory updated to 62 specs (24 cross-env T-tests, 33
env-specific S-tests, 5 H-prefix harness self-tests). S14 row added;
lib/input-niri.ts entry added to the substrate-primitives layout
block and to the lib/ paragraph that lists each primitive's
consumer specs.

Followup prompt rewritten for session 7. Main bet now shifts to:

- A: eipc-registry exposer (now the cleanest single-session win
  available — sessions 3-6 each kept punting because lower-risk
  work was on the table; with the obvious focus-shifter / mock-
  then-call substrate work landed, Category A is the only path
  forward to proper Tier 2 runtime probes for T22/T31/T33/T38
  AND unblocks T35 Phase 2 / T37 Phase 2). Three approaches
  documented for the inspector walk: module-level grep for
  registry exposers, hook-the-eipc-registration-site, patch-in-
  a-dev-only-exposer.
- B: T35 Phase 2 / T37 Phase 2 paired with Category A. Skip
  unless A lands first.
- C: Single-spec deferred items audit (S20 still open on #569;
  T34 OAuth round-trip; T36 Phase 2 reclassified out;
  cross-compositor S14 variants speculative without a consumer).

New constraints from session 6 documented in the prompt:

- lib/input-niri.ts stays Niri-only by design — strict
  XDG_CURRENT_DESKTOP === 'niri' gate. Sway / Hyprland / River
  consumers must skip or live in their own per-compositor files.
- Don't speculate on a lib/input-wayland.ts dispatcher.
  Per-compositor files until a second Wayland consumer lands.

Cumulative "stop and report" outcome count bumped to ~13 across
sessions 1-6 (added: session-6 lib/input-niri.ts shipped untested-
on-niri).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 19:19:45 -04:00
aaddrick
34e9077dd2 test(harness): session 6 runner + niri-native focus-shifter primitive (1 new spec + 1 new primitive, 80% → 82% coverage)
Coverage 61/76 → 62/76. One new spec + one new primitive land. Per
session 5 recon, the niri IPC contract is stable in --json mode and
the API sketch in plan-doc was directly implementable.

New primitive (lib/input-niri.ts):

Wayland-native focus-shifter sibling of lib/input.ts. Niri-only by
design — strict XDG_CURRENT_DESKTOP === 'niri' gate via
isNiriSession(). Exports mirror the X11 sibling's shape:

- focusOtherWindow(title): three-step chain — niri msg --json windows
  → app_id !== 'Claude' filter + title match → niri msg action
  focus-window --id <u64> → honest readback via getFocusedWindowId()
  using retryUntil(3s/100ms). The readback is load-bearing: niri's
  focus-window action exits 0 even when the compositor refuses
  activation; only the focused-window IPC is the honest answer
  (mirrors lib/input.ts's xprop verification reasoning).
- spawnMarkerWindow(title): backgrounded foot --title <T> -e sleep
  600 with detached:false (matches lib/input.ts's xterm pattern —
  parent-death cleanup beats the marginal robustness of detached
  spawn). 500ms grace before SIGKILL fallback.
- getFocusedWindowId(): parses niri msg --json focused-window to
  number | null (niri u64 IDs are numeric, unlike X11's hex strings).
- isNiriSession(): pure XDG_CURRENT_DESKTOP env check.
- NiriIpcUnavailable / FootUnavailable typed errors for clean
  testInfo.skip() integration in consumers.

Defensive unwrapOk helper handles both the older
{Ok: {FocusedWindow: ...}} Result-style JSON envelope and newer
bare-payload responses; if a third niri version ships a different
shape, the parser falls through to null rather than crashing. The
app_id !== 'Claude' guard prevents the focus shift from accidentally
targeting Claude's own window.

Untested-on-real-Niri caveat: landed against session 5 recon notes,
not a live niri session. KDE-W typecheck + skip-via-row-gate confirms
the file is well-formed; the first real Niri sweep will confirm (a)
the Ok-wrapper unwrap covers the niri version on the row, (b)
Claude's literal app_id value is 'Claude', (c) foot is on the target
row's PATH.

Cross-compositor expansion deliberately not built — sway / hyprland /
river each have completely different IPCs and would each get their
own per-compositor file, not bolted into input-niri.ts. With S14 the
only consumer, a lib/input-wayland.ts dispatcher would be ceremony
(matches the threshold-driven extraction discipline of
lib/electron-mocks.ts and lib/input.ts).

New spec (S14):

S14 (Quick Entry shortcut fires from any focus on Niri) — Tier 2
known-failing detector. Near-clone of S11 with imports swapped to
lib/input-niri.js and the row gate flipped from ['GNOME-X', 'Ubu-X']
to ['Niri']. Same five-phase shape: setup → mainVisible ready →
foot marker spawn → focus loop with NiriIpcUnavailable /
FootUnavailable sticky-error short-circuits → Ctrl+Alt+Space press
+ assert popup.visible. Single-shot s14-diagnostics JSON attachment
mirrors S11's shape with activeWidBeforeFocus / activeWidAfterFocus
typed number | null per the niri u64 ID contract.

Currently a known-failing detector per case-doc S14 (Failed to call
BindShortcuts (error code 5) on Niri); same shape as S12's GNOME-W
--enable-features=GlobalShortcutsPortal detector — the spec encodes
the contract and will start passing on Niri rows once the upstream /
Chromium-side portal issue resolves, without any spec edit.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 19:19:21 -04:00
aaddrick
88f3bd5941 docs(testing): session 5 plan/inventory + rotate session 6 prompt
Plan-doc Status (post-execution): session 5 section added at top
covering T18 ship + the SessionStart-hook-fires-on-prompt-submit
finding (which reclassified T36 Phase 2 Tier 2 → Tier 3/4) + the
runtime-probe AX-anchor capture for the Code-tab session opener
(saved without shipping a primitive — T36 Phase 2 was the only
known consumer and it just left Tier 2) + the niri msg IPC recon
verdict (TRACTABLE; lib/input-niri.ts API sketch in place).

Load-bearing finding — SessionStart hook timing:

Session 4's plan-doc framed T36 Phase 2 as needing "a Code-tab
session opener the AX-tree walker hasn't been taught" — implying
the AX tree was the only blocker. Session 5 traced the
SessionStart-hook fire path through bundled index.js and found a
deeper blocker: the hook fires inside the agent SDK process once
it boots, and the agent process is spawned only when there's a
prompt to bind to. Call chain: Ys.startSession (:454743 general,
:489371 CCD) requires A.message; the session record stores it as
initialMessage (:489270); the agent is spawned via
DN({ prompt: k, options: v }) (:489514) only when there's a prompt
stream to bind to. createOrResumeSession (:489208) creates the
session record but doesn't spawn the agent. Conclusion: clicking
"New session" alone navigates to a fresh composer but doesn't boot
the agent. The hook fires only after first prompt submission,
which is a real-account write. T36 Phase 2 unmockable without deep
agent-SDK reverse-engineering.

Code-tab session-opener AX surface verified — anchors saved in
plan-doc rather than shipped to claudeai.ts (premature without a
load-bearing consumer):

- Top-tab Code button: button[name="Code"] under group[Mode]
  under complementary. Disambiguator from the prompt-mode
  tab[name="Code"] in tablist[name="Prompt categories"] (which
  is what T16's existing CodeTab.activate() clicks).
- Sidebar entries (Code mode active): button[name="New session
  ⌘N"], button[name="Routines"], button[name="Customize"],
  button[name="More navigation items"], plus
  button[name="Pinned"] / button[name="Recents"] section
  headings.
- Recents items: button[name="<status> <title>"] where status ∈
  {Idle, Ready, Needs input, Awaiting input}. Main-pane Welcome
  surface uses button[name="Open session <title>"] — either
  anchor would work for an openExistingSession(re) consumer.
- URL of Code-tab landing: /epitaxy.

niri msg IPC recon — TRACTABLE:

Wiki contracts the --json output as stable; plain text is unstable.
niri msg --json windows returns Vec<Window> with {id, title,
app_id, pid, workspace_id, is_focused, ...}; niri msg action
focus-window --id <u64> injects focus; niri msg --json
focused-window is the honest readback (the equivalent of xprop
_NET_ACTIVE_WINDOW for the X11 primitive). foot --title <T> -e
sleep 600 is the wlroots-friendly marker. Cross-compositor
consideration: per-compositor files (lib/input-niri.ts,
lib/input-sway.ts, …) are cleaner than a unified abstraction —
sway / hyprland / river have totally different IPCs, a
lib/input-wayland.ts dispatcher would just be a 10-line switch.
libei is the long-term answer but isn't widely deployed; don't
block S14 on it.

Session 6 prompt rewritten. Three categories with the guidance to
pick ONE as the main bet:

- A: lib/input-niri.ts + S14 runner. Recon-sketched API, IPC
  contract is stable. Cleanest single-session win — single
  primitive build + single consumer ready to ship.
- B: eipc-registry exposer (unchanged from sessions 4 / 5;
  closure-local in main; reverse-engineering remains
  unattempted). Same warning: session 3's inspector walk came up
  empty; needs a fresh approach.
- C: Single-spec deferred items audit. T35 Phase 2 / T37 Phase 2
  still blocked on closure-local readback (skip unless paired
  with Category B); T36 Phase 2 NO LONGER A CANDIDATE.

New constraints from session 5 documented in the prompt:

- lib/input.ts stays X11-only by design; if Category A ships,
  the niri variant goes in lib/input-niri.ts (sibling, NOT a
  Wayland catch-all — sway/hyprland/river have totally different
  IPCs).
- Don't speculate on a lib/input-wayland.ts dispatcher.
  Per-compositor files until a second consumer (Sway / Hyprland /
  River row) lands.
- Code-tab AX anchors stay in plan-doc until a consumer needs
  them. Don't preemptively add CodeTab.activateTopTab() to
  claudeai.ts — T36 Phase 2 was the only consumer and it's now
  Tier 3/4. Premature abstraction is wrong abstraction.
- T36 hooks-fire-on-prompt-submit added to the destructive Tier 3
  list (alongside T22 PR write, T27 scheduling, T29 worktree,
  T34 OAuth) — only read-only reframes are in scope.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 18:49:36 -04:00
aaddrick
d5e1edc11b test(harness): session 5 runner + drag-drop bridge fingerprint (1 new spec, 79% → 80% coverage)
Coverage 60/76 → 61/76. One new spec lands. No new primitives —
session 5 ran light because the runtime probe + bundled-source
trace consumed half the budget (load-bearing finding documented in
the docs commit that follows).

New spec:

- T18 (Drag-and-drop files into prompt) — Tier 1 / asar fingerprint
  against bundled mainView.js (first runner to read a non-index.js
  source — lib/asar.ts's readAsarFile already supports it). Four
  needles pin the preload-bridged path-resolution wiring: the
  property key `getPathForFile` + the `webUtils.getPathForFile(`
  call (both at case-doc :9267 — count 2× combined), `webUtils`
  (1×, :9267), `filePickers` (1×, :9267), `claudeAppSettings` (1×,
  :9552 — the contextBridge.exposeInMainWorld namespace the
  renderer accesses as window.claudeAppSettings). Per-needle
  occurrence counts attached as JSON for drift detection (mirrors
  T36's pattern). Bundle form matches case-doc form verbatim — no
  minified-vs-beautified gotcha (unlike T35's
  ~/.claude.json → .claude.json).

Why Tier 1, not Tier 2/3:

A real OS-level drag-drop test needs to put file URIs on the
desktop's drag selection so Chromium's drop handler fires the
path-resolution bridge with a file payload. Both backends are
dead-ends with the primitives we have:

- X11: xdotool can simulate mouse motion + button press but
  cannot put file URIs on the X11 XDND selection. A simulated
  drag against a marker window arrives at Chromium as a mouse
  drag with no file payload — the bridge is never exercised. A
  real OS-level XDND test needs a custom XDND source app (heavy
  primitive build); deferred.
- Wayland: same shape — per-compositor IPC plus libei input
  injection. Same primitive gap.

Since the load-bearing surface is the bridge wiring (preload
expose + the webUtils.getPathForFile call), pinning the bundle
strings catches every regression that would matter to the
case-doc claim, without faking OS drag-drop. Same pattern as
T35/T36 from session 4: when Tier 2 readback isn't reachable,
ship the Tier 1 fingerprint against the actual load-bearing
strings.

README inventory updated to 61 specs (24 cross-env T-tests, 32
env-specific S-tests, 5 H-prefix harness self-tests). T18 row
added; the `app.asar content reads` footnote calls out that T18
reads mainView.js (every other asar-fingerprint runner reads
index.js).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-03 18:48:52 -04:00
aaddrick
9e561c0c49 docs(testing): session 4 plan/inventory + rotate session 5 prompt
Plan-doc Status (post-execution): session 4 section added at top
covering T35 / T36 / S11 ship + S14 primitive-gap deferral + the
lib/input.ts X11-only-by-design reasoning + the eipc-registry
exposer carrying over to session 5 unattempted.

Followup prompt rewritten for session 5. Three categories with the
guidance to pick ONE as the main bet:

- A: eipc-registry exposer (reverse-engineer the closure-local
  registry near :68816-:68820; high-risk-high-reward; would unblock
  T22/T31/T33/T38 Tier 2 runtime probes — currently Tier 1
  fingerprints).
- B: Code-tab session opener primitive in claudeai.ts (would unblock
  T11/T19/T20/T31/T32 full forms + T36 Phase 2 + T37 Phase 2). AX-
  tree teaching work; potentially multi-session.
- C: Single-spec deferred items audit (T18 X11 drag-drop, S14
  Wayland variant exploration, S20 once #569 lands).

New constraints from session 4 documented in the prompt:

- lib/input.ts is X11-only — strict XDG_SESSION_TYPE === 'x11' gate.
  Wayland-native focus injection goes in a sibling file, not bolted
  into the existing one.
- Always grep the installed asar before settling on a fingerprint
  string; case-doc text is sometimes the user-facing form (e.g.
  ~/.claude.json) not the bundle form (.claude.json — minified
  strips the path-prefix style and resolves home at use).
- Marker windows / sacrificial host processes always die in finally
  (S11 is the template).
- Single-shot diagnostic JSON dump (S11 / S31 pattern) cleaner than
  many separate testInfo.attach() calls for multi-state tests.

New termination condition: if Category A's inspector walk turns up
empty after 2-3 distinct approaches, STOP — document the dead-end
as a finding, ship a documentation runner if it surfaces useful
state, pivot to B or C.

Cumulative "stop and report" outcome count bumped to ~10 across
sessions 1-4 (added: S14 primitive-gap, T35 Phase 2 deferral, T36
Phase 2 deferral).

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 18:15:12 -04:00
aaddrick
aa139be763 test(harness): session 4 runners + focus-shifter primitive (3 new specs, 75% → 79% coverage)
Coverage 57/76 → 60/76. Three new specs land plus one new primitive
(lib/input.ts focus-shifter). One case-doc spec (S14) explicitly NOT
shipped — documented as primitive-gap.

New specs:

- T35 (MCP server config picked up) — Tier 1 / Phase 1 fingerprint:
  four-needle asar probe pinning chat-tab vs Code-tab MCP separation
  (claude_desktop_config.json chat-tab path + .claude.json + .mcp.json
  Code-tab loaders + "user","project","local" settingSources triple
  Code-session passes to the agent SDK). Case-doc anchors :130821 /
  :176766 / :215418 / :489098. Phase 2 (fixture-then-readback)
  deferred — parsed MCP server state is closure-local, same blocker
  as T37b/S19/S28.
- T36 (Hooks fire) — Tier 1 / Phase 1 fingerprint: five-needle asar
  probe in T37's "single-occurrence high-signal anchor + registry
  tokens" shape — hook_started / hook_progress / hook_response (each
  1× at :493411, Verbose-transcript runtime emits) plus PreToolUse
  (17×, :455717) and UserPromptSubmit (4×, :455819) registry tokens.
  Per-needle occurrence counts attached for drift detection. Phase 2
  (settings.json fixture + Code-session marker readback) deferred —
  needs login + a Code-tab session opener the AX-tree walker hasn't
  been taught.
- S11 (Quick Entry shortcut from any focus) — Tier 2: spawn xterm
  marker via lib/input.ts:spawnMarkerWindow, focus it via
  focusOtherWindow (xdotool windowfocus + xprop _NET_ACTIVE_WINDOW
  verification), then fire Ctrl+Alt+Space via ydotool and assert
  popup is visible. Single-shot s11-diagnostics JSON attachment
  collects sessionEnv / markerTitle / active-WID before+after /
  popupState / openError / launcher-log tail. Marker xterm killed in
  finally before app.close.

Row-gate decision (load-bearing for S11):

S11's case-doc applies-to is "GNOME, Ubu" (W and X variants), but
the focus-shifter primitive is X11-only — strict
XDG_SESSION_TYPE === 'x11' gate. So the runner's row gate is
['GNOME-X', 'Ubu-X'] only. The case-doc's load-bearing concern is
the GNOME-W mutter XWayland key-grab regression (#404); that
regression CANNOT be detected here because there's no portable
focus-injection on native Wayland (each compositor exposes its own
IPC; libei isn't universally honored). What S11 catches: a
regression in the X11 path of the global shortcut on GNOME-X /
Ubu-X — a currently-passing detector unlike S12 which is
currently-failing.

S14 NOT shipped — primitive gap:

S14's only row gate is Niri (wlroots Wayland with no XWayland), so
the focus-shifter primitive throws WaylandFocusUnavailable there;
any S14 runner consuming the new primitive would skip on every row
in its gate — the definition of a stub. Per "don't ship stubs",
S14 stays unshipped and is documented as needing Wayland-native
focus injection (Niri's `niri msg` IPC, or libei when broadly
available). The Tier 1 reframe (assert
--enable-features=GlobalShortcutsPortal in argv) is already covered
by S12.

New primitive (lib/input.ts):

X11-only by design. Strict XDG_SESSION_TYPE === 'x11' gate via
isX11Session() — single source of truth. xdotool windowfocus exits
0 even when the compositor refuses activation, so post-focus
verification via xprop _NET_ACTIVE_WINDOW readback is the honest
answer. Exports:

- WaylandFocusUnavailable / XdotoolUnavailable (typed errors so
  consumers can `instanceof` skip vs fail).
- isX11Session() — single-source-of-truth env check.
- getFocusedWindowId() — parses xprop output to lowercase
  0x-prefixed hex; returns null on Wayland or xprop failure.
- focusOtherWindow(title) — xdotool search --name + windowfocus,
  then retryUntil-poll _NET_ACTIVE_WINDOW for ~3s budget; throws
  on compositor refusal so S11/S14 see refusals as real failures
  rather than silent skips.
- spawnMarkerWindow(title) — backgrounded `xterm -e 'sleep 600'`
  with kill-with-grace lifecycle (SIGTERM + 500ms grace + SIGKILL
  fallback). Caller owns kill in finally.
- MarkerWindow interface for the spawn return shape.

Wayland-native focus injection is intentionally NOT in this file —
sibling file (lib/input-niri.ts or libei layer) when needed.

KDE-W: T35 ✓ pass (182ms), T36 ✓ pass (112ms), S11 ⊘ skipped
(row mismatch — KDE-W not in [GNOME-X, Ubu-X], expected).

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 18:14:49 -04:00
aaddrick
ee7b35ff86 docs(testing): session 3 plan/inventory + rotate session 4 prompt
Updates the post-execution status section with session 3's seven
shipped specs, the eipc-registry finding (corrects session 2's T38
assumption), and the four reclassifications (T22/T31/T33/T38 from
Tier 2 IPC probes to Tier 1 fingerprints). Captures the
authentication-state lesson too — launches that depend on
authenticated renderer state need createIsolation({ seedFromHost:
true }), even if the case-doc-shaped Tier 2 form looks hermetic on
paper.

README inventory grows from 50 to 57 specs and adds a note that
LocalSessions_$_* / CustomPlugins_$_* channels use a custom eipc
protocol, not Electron's standard ipcMain.handle() — so future
runners should anchor on channel-name strings (Tier 1) rather than
introspect _invokeHandlers (broken).

Followup prompt rewritten for session 4: focus-shifter primitive +
S11/S14, T35 MCP separation fingerprints (Phase 1) and optional
fixture-readback (Phase 2, may abort), and the eipc-registry
exposer as a flagged primitive gap.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 17:40:33 -04:00
aaddrick
549bf4281a test(harness): session 3 runners (7 new specs, 66% → 75% coverage)
Coverage 50/76 → 57/76. Seven new specs land + one session-2 carryover
(T38) reclassified after the eipc-registry finding below.

New specs:

- T22 (PR monitoring) — Tier 1 fingerprint: LocalSessions_$_getPrChecks
  eipc channel name + "gh CLI not found in PATH" Linux-fallthrough
  throw site (case-doc anchors :464281 / :464964 / :464368).
- T24 (Open in editor) — Tier 2 mock-then-call: installOpenExternalMock
  patches shell.openExternal from main, evalInMain calls it with a
  vscode://file/... URL, assert recorded call lists URL verbatim. No
  real editor launch (mock returns Promise<boolean>).
- T30 (Auto-archive cadence) — Tier 1 fingerprint: single regex
  anchoring 300*1e3 ≤ 3600*1e3 ≤ AutoArchiveEngine in colocation
  (≤200 / ≤3000 char proximity windows tuned to current bundle), plus
  ccAutoArchiveOnPrClose .includes() inside the captured window.
- T31 (Side chat) — Tier 1 fingerprint: side-chat eipc trio
  (startSideChat / sendSideChatMessage / stopSideChat).
- T32 (Slash menu) — Tier 1 fingerprint:
  LocalSessions_$_getSupportedCommands + slashCommands schema.
- T33 (Plugin browser) — Tier 1 fingerprint:
  CustomPlugins_$_listMarketplaces + listAvailablePlugins.
- T37 (CLAUDE.md memory) — Tier 1 fingerprint: high-signal
  "[GlobalMemory] Copied CLAUDE.md" log line + CLAUDE.md filename +
  CLAUDE_CONFIG_DIR env-var token. Fixture-readback form deferred —
  parsed-memory state is closure-local.

eipc-registry finding (T38 reclassification):

Session 2's T38 used ipcMain._invokeHandlers introspection. KDE-W run
revealed that registry holds only three chat-tab MCP-bridge handlers
(list-mcp-servers, connect-to-mcp-server, request-open-mcp-settings)
regardless of ready level (mainVisible / claudeAi / userLoaded) and
regardless of authentication state (default isolation vs.
seedFromHost: true verified via probe). The
$eipc_message$_<UUID>_$_claude.web_$_<name> protocol uses a closure-
local message-port registry not reachable from globalThis — same
gotcha as session 2's Sbn() (S28) and cE()/Tce() (S19).

T38 rewritten as a Tier 1 asar fingerprint anchoring on the
LocalSessions_$_openInEditor channel-name string in the bundle. T22,
T31, T33 (originally drafted with the same broken pattern) ship as
Tier 1 fingerprints from the start. T24 is unaffected — it patches
the stdlib Electron shell module from main, not the eipc layer.

KDE-W: 9/9 pass in 18.2s (7 new + T25 verifying the lib import-extract
didn't break it + T38 reclassified).

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 17:40:21 -04:00
aaddrick
ce2e5325d3 refactor(harness): extract electron-mocks.ts once T24 lands the third helper
Session 3 brings the third mock-then-call helper online
(installOpenExternalMock for shell.openExternal, mirroring
installShowItemInFolderMock and installOpenDialogMock). Threshold from
the session prompt was met — pull the three install/get pairs out of
lib/claudeai.ts into a dedicated lib/electron-mocks.ts. The mocks are
generic Electron module patches (dialog, shell), not claude.ai-domain,
so the new home keeps claudeai.ts focused on AX-tree page-objects.

T17, T25 imports updated to point at the new module. T24 (added in the
follow-up commit) imports from electron-mocks.ts directly.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 17:39:50 -04:00
aaddrick
86385848d0 docs(testing): session 2 plan/inventory + rotate session 3 prompt
- runner-implementation-plan.md: new "Status (post-execution)" sub-
  section for session 2 listing the 10 new specs and the four
  reclassification notes (S28 → Tier 1, T38 framing, T23 tool choice,
  S19 honest-stub note). Session 1 sub-section preserved verbatim
  below for comparison.
- README.md: 50-spec inventory (was 40), new T-rows (T10, T16, T23,
  T25, T26, T38) and S-rows (S10, S19, S25, S28) interleaved into
  the existing tables. Substrate-primitives paragraph extended with
  dbus-monitor, mock-then-call, ipcMain registry introspection,
  safeStorage round-trip, extraEnv precedence.
- runner-implementation-followup-prompt.md: rewritten for session 3
  — deferred items (T31, T32, S06, S11, S14), Tier 3 → Tier 2
  reframes (T22, T35, T37), asar fingerprint cleanups (T24, T30,
  T33), the focus-shifter primitive build, and the mock-then-call
  extension for T24 as an alternative to its asar form. Includes
  the "known mechanism-recipe table" cumulating sessions 1+2.
- runner-implementation-prompt.md: deleted (session 1's prompt,
  superseded by the followup that's been the rolling document
  since session 1 ended).

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 17:01:55 -04:00
aaddrick
fb5189fe45 test(harness): session 2 runners (10 new specs, 53% → 66% coverage)
Categories landed:
- B (seedFromHost-unlocked): T16 (Code tab loads), T26 (Routines page
  renders) — both promote Tier 3 → Tier 2 via the seedFromHost
  primitive shipped in session 1.
- A (Tier 2 single-launch deferred from session 1): T10 (Cowork daemon
  respawn after SIGKILL), S10 (KDE-W Quick Entry popup transparent),
  S25 (safeStorage round-trip across two launches with shared
  isolation handle).
- C (Tier 2 reframes): T23 (Notification reaches DBus via dbus-monitor
  subprocess), T25 (shell.showItemInFolder via mock-then-call —
  mirrors T17's installOpenDialogMock), T38 (openInEditor IPC handler
  registered probe via ipcMain._invokeHandlers), S19
  (CLAUDE_CONFIG_DIR extraEnv reaches main process).
- Tier 1 reclass: S28 (worktree permission classifier asar fingerprint
  — Sbn() is closure-local, not inspector-reachable).

Mechanism notes — see plan doc status section for full rationale:
- T23 uses dbus-monitor not gdbus monitor (the latter only sees
  signals owned by a destination, not method calls to it).
- T38 inspects ipcMain._invokeHandlers for handler registration; the
  channel ends in $eipc_message$_<UUID>_$_claude.web_$_<name> with a
  build-stable UUID prefix — anchors on the suffix.
- T25 mock-then-call beats invoke-then-cleanup (no host file manager
  pop-up, stronger assertion).
- S25 compares decrypted plaintexts not ciphertexts (safeStorage on
  Linux uses random IVs).

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 17:01:42 -04:00
aaddrick
1f5702bc7b test(harness): add installShowItemInFolderMock for mock-then-call probes
Mirrors lib/claudeai.ts:installOpenDialogMock (used by T17). Replaces
electron.shell.showItemInFolder with a recording mock so Tier 2
reframe specs can assert "the IPC layer reaches the egress with the
right path" without firing the real DBus FileManager1 / xdg-open
dispatch on the host.

Idempotent (guarded by globalThis.__claudeAiShowItemMockInstalled),
matches the existing mock helper's call-recording shape, exports a
companion getShowItemInFolderCalls reader. Used by the rewritten T25
runner in the next commit.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 17:01:17 -04:00
aaddrick
11ab62afcd test(harness): Tier 2 runners (9 single-launch / hermetic-auth probes)
Single launchClaude() + inspector + Electron-API or window-state
assertion. Each runner asserts a contract that requires the app to
actually be running.

Specs landed:

- T05 — claude:// URL delivers via app.on('second-instance')
  (Tier 3 delivery probe: xdg-open fires the URL, the running app's
  hook captures it). Uses isolation: null because the SingletonLock
  collision must route to the same user-data dir.
- T06 — globalShortcut.isRegistered('Ctrl+Alt+Space') returns true
  after waitForReady('mainVisible')
- T07 — five topbar buttons render with non-zero rects. First spec
  to exercise createIsolation({ seedFromHost: true }) — kills host
  Claude, copies auth allowlist (Cookies, Local State, Local Storage,
  IndexedDB, etc.) into per-test tmpdir, runs hermetically against
  signed-in account, tmpdir destroyed on close.
- T08 — MainWindow.setState('close') fires the wrapper's close
  interceptor; window hidden, proc still alive
- T09 — setLoginItemSettings({ openAtLogin }) writes/removes
  $XDG_CONFIG_HOME/autostart/claude-desktop.desktop
- T12 — app.getGPUFeatureStatus() returns populated object;
  reaching mainVisible proves the renderer didn't crash
- T14b — second invocation under same isolation exits cleanly via
  requestSingleInstanceLock early-return; primary pid stays alive
- S07 — under CLAUDE_HARNESS_USE_WAYLAND=1, spawned Electron has
  --ozone-platform=wayland on argv (skips when env unset)
- S17 — shell-path-worker overlays the user's login-shell PATH onto
  a deliberately-scrubbed env. Re-forks shellPathWorker.js via
  utilityProcess.fork + MessageChannelMain to observe the worker
  output directly (the main-process FX() merger only fills undefined
  keys, so reading process.env.PATH after a non-undefined override
  wouldn't observe the effect).

T05 originally planned as a Tier 2 isDefaultProtocolClient probe
but reshaped — that runtime call is a no-op in the harness because
ELECTRON_FORCE_IS_PACKAGED=true makes app.getName() resolve to
"Claude" (not "claude-desktop"), so the xdg-mime shellout fails
silently. Real registration is install-time via the .desktop file
MimeType= line. T05 ships as the delivery probe instead.

T07 originally deferred to Tier 3 ("topbar is React-rendered SPA")
but the harness's seedFromHost primitive (isolation.ts:37-44, never
exercised before this commit) lifts it back to Tier 2.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 14:42:32 -04:00
aaddrick
bebe83d194 test(harness): Tier 1 runners (16 file/spawn/argv probes)
Each runner is independent of the others and matches one case-doc
test ID. Pure file probes (asar fingerprints, source-tree grep) and
short-lived spawn probes; no app launch needed.

Specs landed:

- T02 — claude-desktop --doctor exit code is 0
- T11 — plugin install code path fingerprints (installPlugin log,
  installed_plugins.json) present in bundled index.js
- T13 — --doctor does not false-flag rpm/deb installs as
  missing-dpkg AppImage
- T14a — requestSingleInstanceLock + 'second-instance' strings in
  bundle (T14b runtime probe lands separately)
- S01 — AppImage launches without libfuse.so.2 complaint (skips
  cleanly on non-AppImage rows)
- S02 — no strict == equality against XDG_CURRENT_DESKTOP in
  launcher / patches (regression detector)
- S03 — dpkg-query Depends: field non-empty (currently fails as
  upstream-contract regression detector — deb.sh:185-197 emits no
  Depends: line)
- S04 — rpm -qR has at least one non-rpmlib(...) requirement
  (currently fails — rpm.sh:188 has AutoReqProv: no, no manual
  Requires:)
- S05 — doctor does not false-flag rpm-installed package
- S08 — KDE tray-rebuild fast-path (.setImage(...createFromPath...))
  injected by tray.sh:212-217
- S15 — AppImage --appimage-extract fallback exits 0; squashfs-root/
  AppRun --version runs without FUSE error
- S16 — AppImage mount(8) entry appears post-launch and clears
  within ~10s of close
- S21 — no handle-lid-switch / HandleLidSwitch strings in bundle
  (lid policy deferred to OS)
- S22 — new Set(["darwin","win32"]) computer-use platform gate
  present, no 2-element Set pairing linux (file-probe form)
- S26 — setFeedURL present + project suppression marker absent
  (currently fails — gated on #567 auto-update suppression patch)
- S27 — installed_plugins.json + homedir resolver present, no
  */plugins system paths in bundle

Three specs are intentional regression detectors — they ship "red"
today (S03, S04, S26) because the upstream contract isn't yet met.
Each error message names the upstream defect or issue so matrix-regen
surfaces them as actionable cells.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 14:42:04 -04:00
aaddrick
61245bcc81 test(harness): scaffolding for Tier 1/2 runner batch
- runDoctor() now returns {output, exitCode} so T02/T13/S05 can
  assert against the doctor exit code (was string-only, swallowed
  the code).
- MainWindow.setState() accepts 'close' and calls win.close() so T08
  exercises frame-fix-wrapper.js:178-185 (the close-to-tray
  interceptor) — distinct from 'hide' which would bypass the
  wrapper.
- Add docs/testing/runner-implementation-plan.md: tiered triage of
  the 61 missing runners with execution-time reclassifications
  (T05 → Tier 3 delivery, T07 → Tier 2 via seedFromHost, T14 split
  into a/b, S20 deferred via #569).
- Refresh T13/S05 case-doc anchors: scripts/doctor.sh:290-299 →
  :353-362 (file edited since the anchor was written).
- Update test-harness README status to reflect the post-batch spec
  inventory and link to the plan doc.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 14:41:35 -04:00
aaddrick
2ca35610ec docs(testing): runner-implementation prompt for next session
Counterpart to docs/testing/cases-grounding-prompt.md — a fan-out
prompt for the workstream of wiring runners against the 61 of 76
tests that don't have one yet.

Structured the same way as the grounding prompt: Phase 0 calibration,
Phase 1 triage subagent producing a tiered plan
(docs/testing/runner-implementation-plan.md), Phase 2/3 fan-out per
test in Tiers 1-2, Phase 4 synthesis. Tier 3 (renderer-heavy /
login-required) deferred to follow-up sessions; Tier 4 (CLI binary,
issue-gated, env-blocked) marked out of scope with reasons.

Constraints flag the known landmines: CDP gate workaround, the
BrowserWindow Proxy gotcha, default isolation + escape hatches,
ydotool prereqs, skipUnlessRow as the first line of every spec.
"Don't ship stubs" called out explicitly so a session that hits a
blocker reports it instead of leaving placeholder runners that pass
trivially.

Realistic next-session goal: 13-16 new runners (Tier 1 + as much
Tier 2 as fits), bumping coverage from 15/76 (20%) to ~30/76 (40%).
Future sessions handle the renderer-heavy Tier 3 once they have a
session-time budget and host claude.ai login.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 08:13:04 -04:00
aaddrick
4d29cf83fa docs(testing): document grounding sweep workflow + probe + Wayland mode
The action items from the last few sessions (case-doc grounding,
runtime probe, autoUpdater issue, Wayland-mode runs) needed pointers
across the testing docs so the next contributor isn't reverse-
engineering them from git log.

- docs/testing/README.md — bump date, surface grounding sweep + probe
  in the automation-status section, fix the test corpus snapshot
  (S-tests went from 28 to 37 since this was last counted).
- docs/testing/runbook.md — add "Grounding sweep" section (static
  pass + runtime pass) alongside the existing test sweep, document
  the Wayland-mode sweep recipe, link upstream-bump trigger to it.
- tools/test-harness/README.md — add grounding-probe.ts to the
  layout, a Run-section recipe, and a dedicated "Grounding probe"
  section explaining when to reach for it vs the static grep.
- docs/testing/cases/distribution.md — link S26 to issue #567
  (autoUpdater no-op tracking), now that the bug is filed.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 08:08:23 -04:00
aaddrick
af3c31b511 test(harness): CLAUDE_HARNESS_USE_WAYLAND for full-suite native Wayland runs
Adds a top-level harness flag that flips every launchClaude() spawn from
the default X11-via-XWayland backend to native Wayland, so the full
suite can run under Wayland with a single env var instead of per-spec
plumbing.

Implementation mirrors scripts/launcher-common.sh:132-139:
- Renames LAUNCHER_INJECTED_FLAGS to LAUNCHER_INJECTED_FLAGS_X11 and
  adds LAUNCHER_INJECTED_FLAGS_WAYLAND with the launcher's Wayland
  flag set (UseOzonePlatform, WaylandWindowDecorations, ozone-platform,
  wayland-ime, wayland-text-input-version=3).
- harnessUseWayland() reads CLAUDE_HARNESS_USE_WAYLAND.
- launchClaude() picks the flag set, adds CLAUDE_USE_WAYLAND=1 and
  GDK_BACKEND=wayland to the spawn env. Spread order keeps caller-
  supplied extraEnv winning, so a single test can still opt back to X11
  inside a Wayland-mode sweep.
- sweep.sh advertises the mode on stderr.
- README documents the var + the npm-test recipe.

Default unchanged: every runner still gets X11. The flag opts in.

Verification (live): CLAUDE_HARNESS_USE_WAYLAND=1 npx playwright test
src/runners/T17_folder_picker.spec.ts, then while the app is up confirm
--ozone-platform=wayland is on argv via /proc/<pid>/cmdline. The
harness spawns Electron directly (CDP-gate workaround at electron.ts:
102), so launcher-common.sh isn't sourced and ~/.cache/claude-desktop-
debian/launcher.log is not written by harness runs.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 08:02:27 -04:00
aaddrick
b3baa8ad8f docs(testing): extend case-doc template with anchor + drift conventions
Folds the conventions the grounding sweep landed into the README so
future authors and sweeps work from the same shape. Adds:

- **Code anchors:** field — `<file>:<line>` pointers to where the
  load-bearing claim is implemented.
- **Inventory anchor:** field — optional, for surfaces present in
  the v7 walker's idle capture.
- "Anchor scope" section codifying the four buckets (upstream code,
  wrapper, server-rendered SPA, CLI binary) and where to anchor each.
- "Drift markers" section codifying the Drifted / Missing / Ambiguous
  classifications the sweep already uses.

No content changes to existing case files — they already follow these
conventions in practice; the README now documents them.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 08:00:56 -04:00
aaddrick
ade75d748d docs(testing): drop branch-divergence caveats from T07/S13 anchors
Branch was rebased onto main; scripts/wco-shim.js + scripts/patches/
wco-shim.sh are now on this branch via PR #538. The "lives on main, not
yet on docs/compat-matrix" notes the grounding subagent added are no
longer accurate — anchors point at files that exist locally.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:57:50 -04:00
aaddrick
66d390ccec test(harness): grounding-probe round 2 — AX fingerprint, editor channels, SNI
Closes the bulk of the remaining gaps from the last cut:

- AX fingerprint of the current claude.ai webContents (role+name+
  hasPopup, reduced form). Stored once at the top level; per-test
  entries for T22/T26/T31/T32 reference it via { axFingerprintRef }.
  Captures whatever surface is on screen at probe time, so the user
  opens the slash menu / side chat / routines modal / PR toolbar
  before re-running to anchor those surfaces.

- Editor handoff IPC channels (T24/T38). Static anchor is `Mtt` at
  index.js:463902 — variable name is minified, so we match handlers
  by /external|editor|openIn/i name pattern instead. Sufficient to
  diff across upstream versions (renames will surface as removed
  channels with similar replacements).

- SNI / tray registration (T03). `findItemByPid()` from sni.ts attribu-
  tes a registered StatusNotifierItem to our pid. dbus-next is loaded
  via dynamic import so non-DBus environments (CI containers without a
  session bus) still get a partial probe rather than a hard fail.

Reduced gaps[] to just T39 (CLI surface, out-of-scope) and the
optional opt-outs (powerSaveBlocker without --include-synthetic;
empty AX fingerprint when claude.ai isn't loaded yet).

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:56:29 -04:00
aaddrick
5957c8212b test(harness): grounding-probe --launch + synthetic powerSaveBlocker
Two extensions to the grounding probe, each closing a gap I flagged on
the first cut:

- --launch: spins up a fresh isolated instance via launchClaude(),
  waits for 'mainVisible' (cheapest level that returns the inspector),
  captures, tears down. Default still attaches to an already-running
  app on port 9229; --launch is the self-contained / CI-usable path.

- --include-synthetic + S20 powerSaveBlocker probe: starts a blocker,
  reads isStarted, stops immediately. Brief inhibit (~ms). Read-only by
  default — synthetic state changes are opt-in. Doesn't verify the
  case-doc claim that keepAwakeEnabled toggles trigger this; that needs
  correlating settings IO with the `PhA` Set at index.js:241897, which
  depends on minified-name stability. Left to the next sweep.

Argv parser rewritten to handle bare flags (--launch, --include-synthetic)
alongside key/value pairs (--port 9229, --out PATH).

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:56:29 -04:00
aaddrick
cb20fde797 test(harness): add grounding-probe for runtime case verification
Static greps against the 546k-line beautified bundle have known blind
spots — lazy require()s, dynamic handler tables, conditional wiring.
This probe connects to a running Claude Desktop via the existing
InspectorClient (port 9229, opened by launchClaude's SIGUSR1 path) and
dumps runtime state keyed by test-ID into a JSON the next grounding
sweep can diff across upstream versions.

Captures:
- App metadata (version, isPackaged, ready state)
- Full IPC handler registry (invoke + on channels)
- WebContents inventory (URLs, types)
- globalShortcut.isRegistered() for known accelerators
- app.getLoginItemSettings() (autostart resolution)
- safeStorage availability + backend (libsecret on Linux)
- autoUpdater.getFeedURL() — empirical answer to the S26 structural-
  open claim that static analysis couldn't resolve
- Notification.isSupported()

Read-only / non-destructive; observes API state, never clicks UI or
fires shortcuts. Records explicit gaps[] for surfaces it can't reach
from idle (S20 powerSaveBlocker enumeration; T22/T31/T32 contextual
renderer surfaces; T39 CLI binary).

Run: cd tools/test-harness && npm run grounding-probe
Output: /tmp/grounding-probe.json (override with --out PATH)

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:56:29 -04:00
aaddrick
c76f7e62da docs(testing): ground cases against build-reference v1.5354.0
Static anchor sweep: each test in docs/testing/cases/*.md now points at
the upstream code (or wrapper script) backing its load-bearing claim,
so the next sweep can tell "Linux compat regression" apart from "case
doc drifted while we weren't looking."

- 75 tests across 10 files reviewed
- 63 grounded with code anchors (index.js:N, scripts/*.sh:N)
- 9 drifted Steps/Expected corrected against actual upstream behavior
- 2 marked Missing in build (S12 Wayland portal flag, S26 auto-update)
- 1 flagged Ambiguous (T39 /desktop is a CLI surface, not Electron asar)

Notable corrections:
- T05: scheme is claude://, not https:// (project never registers
  x-scheme-handler/https; old spec was always going to fail on Linux)
- T15: sign-in is in-app loadURL into mainView, not xdg-open handoff
- T18: drag-attach uses webUtils.getPathForFile, not text/uri-list MIME
- T20: file conflict check is sha256-based, not mtime-based
- T22: gh-install path is macOS/brew-only on Linux/Windows
- T30: PR-close auto-archive wait is ~5-6 min (5m setInterval + 30s
  startup + 1h non-terminal cooldown), not "~1 minute"
- T14: PR #536 is closed/docs-only — no in-tree multi-instance flag

Inventory anchors added for renderer-side surfaces present in the
idle-state v7 capture (T16 Code tab, T17 select-folder, T26 Routines,
T11/T33 plugin nav). Surfaces inside modals/popups (T22 toolbar, T25
Show-in-Files context menu, T31 side chat, T32 slash menu) are flagged
for re-capture with the surface open.

S26 finding worth follow-up: autoUpdater gate is structurally open on
Linux when packaged (lii() at index.js:508761-508774 returns true with
ELECTRON_FORCE_IS_PACKAGED=true from launcher-common.sh:249) — saved
from real download attempts only by Electron's Linux autoUpdater being
unimplemented.

T07/S13 reference WCO-shim files that exist on main (PR #538 merged
2026-05-01) but not on this branch (docs/compat-matrix forked earlier);
anchors point at main: with explicit caveats.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:56:29 -04:00
aaddrick
5ae25247ef docs(testing): queue cases grounding sweep against build-reference
Adds the implementation prompt for the next session: spawn one
subagent per file in docs/testing/cases/, have each one cross-check
its tests against the extracted Claude Desktop source under
build-reference/app-extracted/, and edit in place to add code
anchors / mark drift / flag missing features. Mirrors the
structure of the already-retired claudeai-lib-ax-migration-prompt.md
so the workflow is consistent.

Triggered by the AX migration validation surfacing how easily case
docs drift from upstream — the test author's "click X menu" can
silently diverge from upstream's actual labels two versions later,
and the failure looks like a Linux compat issue when it's really a
doc-vs-source drift.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:56:29 -04:00
aaddrick
e13660993b test(harness): drop auto-generated U01 sweep spec
The 90-test U01 sweep was wired against an account-specific v7
inventory snapshot; running it during routine sweeps fired noise
against unrelated drift. The spec is auto-generated from the v7
inventory via npm run gen:render-specs, so this is a soft delete —
regenerate any time a fresh inventory walk lands.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:56:29 -04:00
aaddrick
7715952c3f test(harness): migrate claudeai.ts page-objects to AX-tree substrate
Replace every CSS-shape walk in lib/claudeai.ts with AX-tree queries
sourced from Chromium's Accessibility.getFullAXTree. Discovery now
reads role + accessibleName + hasPopup from the same substrate the v7
walker uses, dropping the brittle button[aria-haspopup=menu] +
span.truncate.max-w-[Npx] coupling that was the recurring break point
on every upstream tailwind regen.

Substrate change:
- inspector.ts: surface AxValue + AxProperty types; explicit
  properties? on AxNode so consumers can read state tokens.
- walker.ts: export RawElement, add hasPopup field, populate via
  readHasPopup() reading node.properties[].name === 'hasPopup'.
- selfTest Case 10 covers menu / 'false' / absent values.

Page-object migration (lib/claudeai.ts):
- snapshotAx() helper gates on waitForAxTreeStable by default
  (post-userLoaded the first AX read can return ~4 nodes — see
  docs/learnings/test-harness-ax-tree-walker.md §1).
- Polling loops in openPill (post-click) + clickMenuItem gate once
  upfront, then poll with { fast: true } so per-iteration stability
  re-checks don't fight the menuitem-appear poll.
- activateTab matches role:'button' + literal accessibleName.
- findCompactPills filters by role:'button' + hasPopup === 'menu',
  drops cowork sidebar via /^More options for / exclusion. Drops
  CompactPill.maxW field (tailwind artifact, only ever in error
  messages).
- openPill / clickMenuItem use clickByBackendNodeId for the click
  path — same backend-id flow the walker uses.

Live probe (explore/probe-claudeai-ax.ts) confirmed the discrimination
shapes against the host renderer — found 49 buttons with hasPopup
(48 menu, 1 dialog), env pill 'Local' resolves under main >
region[Primary pane], 37 cowork sidebar triggers correctly excluded
by the row-more-options filter. Caught one bug along the way: CDP
exposes the property as 'hasPopup' (camelCase), not 'haspopup' — the
synthetic selfTest fixture used the wrong casing too, so both sides
agreed on the wrong contract until the live probe surfaced it.

T17_folder_picker passes on KDE-W with CLAUDE_TEST_USE_HOST_CONFIG=1.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:56:29 -04:00
aaddrick
2f308c868c docs(testing): retire spent v7 handoff prompts, queue claudeai.ts AX migration
The three v7 handoff prompts (vocabulary scaffold, AX-tree
substrate migration, U-prefix runner wire-up) have all been
implemented and merged. Retire them — the design contract still
lives in fingerprint-v7-plan.md; the per-iteration prompts were
single-use scaffolding for fresh sessions.

Add claudeai-lib-ax-migration-prompt.md as the next-iteration
handoff: tools/test-harness/src/lib/claudeai.ts is still on the
old substrate (document.querySelector against minified-tailwind
shapes) and is the highest-payoff target for the v7 plan's "design
goal §2: Resilient to cosmetic drift". The prompt mirrors the
prior handoffs' structure (authoritative refs, code anchors,
phases, self-correction loop, termination conditions, final report
format) and scopes the spike at openPill before fanning out to
the rest of the file.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:56:29 -04:00
aaddrick
3ed5dfa84c test(harness): wire up U01 v7 sweep against fresh AX-tree inventory
U01 was a placeholder skipping with "v7 cutover — re-walk required";
the v7 walker has shipped a fresh inventory, so regenerate the spec
and land two resolver fixes the live sweep surfaced.

`findByFingerprint`: the strictness gate only consulted `kind`, so
entries with `kind: persistent` + `classification: instance` (the
post-walk persistent-collapse promotes degenerate-shaped fingerprints
when they appear on ≥3 surfaces) failed with "expected exactly one
match, got N". The fingerprint's own degenerate-shape claim should
win — defer to `classification === 'instance'` too.

`redrivePath`: the dangling `startUrl` parameter was the smoking
gun. After a prior test drilled into a deeper URL (e.g.
/settings/customize), `location.reload()` reloaded the deep URL
instead of returning to startUrl, and the next test's first
`clickById` saw a contaminated surface. Navigate to startUrl when
currentUrl has drifted; reload only when already at startUrl.

Sweep results across three runs: 73/17 → 89/1 → 89/1, with the
single failure being non-deterministic (different test each sweep,
both consistent with focus-management transients and sidebar
virtualization documented in docs/learnings/test-harness-ax-tree-walker.md).

Generator gate inverted to make the safe-by-default path
(seedFromHost: true) trigger when the env var is unset, mirroring
H05's pattern but with the seed lifted from the host config.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:56:29 -04:00
aaddrick
5d7fda521f docs(testing): v7 fingerprint plan, AX-tree learnings, fresh inventory
Plan (docs/testing/fingerprint-v7-plan.md):
- Adds "Live-walk shakedown (post-Phase 2)" subsection enumerating
  the five real bugs the first end-to-end walks surfaced and their
  fixes (AX-stable gate, reload vs navigate, sibling-count list
  heuristic, two new instance shapes, threshold bump)
- Resolves three open questions with first-clean-walk data: CDP cost
  is not a bottleneck (817-node tree settles <1s), role overrides
  work as intended (Skip to content captured as link), no
  account-bound kind needed (existing pattern + heuristic + collapse
  cover the observed cases)
- Cross-references for walk-isolated.ts and clickByBackendNodeId

Learnings (docs/learnings/test-harness-ax-tree-walker.md):
- Five non-obvious AX-tree traps with symptoms + fixes:
  Accessibility.enable async lag, navigateTo no-op carrying state,
  claude.ai's flat dialog/complementary lists, per-row "More options
  for X" trigger needing its own shape, sidebar virtualization vs
  the lookup-failure threshold
- Closing note on driver choice (walk-isolated.ts over explore walk)

Prompts (docs/testing/fingerprint-v7-*-prompt.md):
- implementation-prompt: original v7 walker rewrite prompt
- ax-migration-prompt: DOM-walk -> AX-tree substrate migration prompt
- runners-prompt: NEW. Self-contained prompt for next session to wire
  U01 against the fresh inventory and iterate autonomously to a
  clean pass/drift/fail baseline

CLAUDE.md: link the new learnings doc

Inventory artifacts:
- ui-inventory.json + ui-inventory.meta.json: 90-entry inventory
  captured against claude.ai/epitaxy on app 1.5354.0 via
  walk-isolated.ts seedFromHost path. Marketplace dialog folded to
  single button-instance+704; cowork sidebar to button-instance+72;
  search history to option-instance+25
- ui-vocabulary.json: stable/suspect name corpus derived from prior
  walk
- ui-inventory-reconciliation.md: v6-era reconciliation notes
- ui-snapshots/{README.md,.gitkeep}: snapshots dir scaffold (JSON
  contents gitignored to avoid diff churn)

claudeai-ui-map.md: human-readable map of the inventory's reachable
surfaces

Matrix (docs/testing/matrix.md): U01 row added; entry-count phrasing
generalized so it doesn't go stale on each re-walk

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:56:29 -04:00
aaddrick
04cd879d11 test(harness): v7 fingerprint walker on AX-tree substrate
Switches the inventory walker from a renderer-side
document.querySelectorAll IIFE to Chromium's accessibility tree
(Accessibility.getFullAXTree over CDP). Account-portable element
identification via ariaPath + role + AX-computed name; click path
moves to backendDOMNodeId via DOM.resolveNode + Runtime.callFunctionOn.

Walker (explore/walker.ts):
- snapshotSurface consumes AX nodes via axTreeToSnapshot
- waitForAxTreeStable gates seed snapshot, post-navigation snapshot,
  and every snapshotSurface call (Accessibility.enable lag is async;
  first read on a cold load returns 4 nodes vs 800+ when settled)
- redrivePath uses location.reload() instead of navigateTo to discard
  any state prior drills left in the SPA (open dialog, expanded
  sidebar, scrolled focus)
- captureFingerprint's isListRowChild extended: button + group
  ancestors, plus a sibling-count fallback (>=15 same-role siblings)
  for claude.ai's flat marketplace dialogs and complementary sidebar
- step 3 (positional) skipped for list-row children so they collapse
  via step 4's instance shape
- MAX_CONSECUTIVE_LOOKUP_FAILURES bumped 25 -> 75 for sidebar
  virtualization noise (timeout counter still gates real wedges)
- RawElement / RawAncestor reshaped: tagName / role / ariaLabel /
  textContent / dataState / parentChainSignature / ancestorAriaLabel
  dropped; backendDOMNodeId added; accessibleName is sole name source

Inspector (src/lib/inspector.ts):
- AxNode interface published
- clickByBackendNodeId: DOM.resolveNode + Runtime.callFunctionOn
  (replaces selector-based click reconstruction)

Name classifier (src/lib/name-classifier.ts):
- cowork-session shape regex (Idle|Ready|Awaiting input|...)
- row-more-options shape regex (^More options for )

Isolation (src/lib/isolation.ts):
- seedFromHost option: kill host Claude, copy auth-relevant subset of
  ~/.config/Claude into per-launch tmpdir for U01 / H05

Driver (explore/walk-isolated.ts):
- Replaces explore walk for safe walks: launches Claude inside the
  test-harness isolation rather than mutating the host profile

Runners:
- H05_ui_drift_check.spec.ts (claude.ai UI drift detection)
- U01_ui_visibility.spec.ts (placeholder stub; regenerated post-walk)

Self-test fixtures rewritten as synthetic AxNode trees fed through
axTreeToSnapshot; existing 7 plan-example traces produce identical
idTailFromFingerprint outputs.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:56:29 -04:00
aaddrick
9e72ebb3e0 test(harness): negative validations, harness self-tests, claude.ai UI lib
Adds eighteen pieces of work across the harness, partitioned by file
so they don't conflict, dispatched in parallel and merged together.

== Negative validations on existing runners ==

T03 — assert exactly one SNI item is registered (not just presence),
plus toggle nativeTheme.themeSource and re-assert. Catches the
tray-rebuild-race regression where the destroy+recreate path would
briefly register a duplicate item before deregistering the old one
(see docs/learnings/tray-rebuild-race.md).

S29 — assert the popup BrowserWindow is reused across shortcut
presses, not re-constructed. Counts entries in __qeWindows matching
the popup selector after the first press AND after a second press —
both must equal 1. Catches a regression where lazy-create runs every
press instead of show()/hide() on a persisted Ko ref.

S30 — broadens the "no ghost respawn" delta into a full closeout-
leak panel. Three additional checks BEFORE the post-exit shortcut
press: no `cowork-vm-service` pids remain, the SNI item is
deregistered (connection gone), no leftover `SingletonLock`
symlink under the isolation's configDir. Existing post-shortcut
delta assertion preserved.

S32 — replaces the silent `.catch(() => {})` on waitForPopupClosed
with explicit popup-state-after-submit assertion. The stale-
isFocused short-circuit can also leave the popup visible (since
popup.hide() lives downstream of the skipped show()) — independent
regression detector from the main-window-visibility check.

S34 — adds focus-side assertion to what was a suppression-only
test. Upstream contract is `if (ut.isFullScreen()) { ut.focus();
ide(); }` — verify main is still fullscreen AND focused after the
shortcut. KDE-W/KDE-X hard-fail (focus is reliable on Plasma);
GNOME-W/Ubu-W soft-fixme (mutter routinely no-ops focus on
fullscreen surfaces).

S35 — three-launch shape: the existing two-launch position-memory
check plus an on-disk round-trip (read parsed config.json between
launches to confirm the save handler reached disk) plus a clear-
and-default check (delete the saved key, launch a third time,
assert the popup lands somewhere other than the cleared TARGET —
proves the test is reading the real store). Bumped per-test
timeout from 180_000 to 240_000.

== New harness self-tests (H-prefix) ==

Introduces an H-prefix convention for runners that validate the
harness's preconditions and the build pipeline's invariants —
distinct from T-tests (upstream test cases) and S-tests (doc-
spec entries). Cheap, fast, ground-truth what the other tests
assume.

H01 — CDP gate canary. Spawns bundled Electron with
`--remote-debugging-port=0` and no CLAUDE_CDP_AUTH; asserts exit
code 1 within 10s. If the gate is ever accidentally removed, this
fires before the rest of the L1 strategy silently weakens.

H02 — frame-fix-wrapper presence. Asserts both
`frame-fix-wrapper.js` and `frame-fix-entry.js` exist in app.asar,
the wrapper contains `Proxy(`, and `package.json#main` references
the entry. File probe — sub-second.

H03 — patch fingerprints. Manifest-based check for every
build-pipeline patch (KDE gate, frame-fix inject, tray
nativeTheme guard, cowork Linux daemon shutdown, claude-code
linux-arm64 branch). Catches silent build-orchestrator drift.

H04 — cowork daemon lifecycle. Baseline pgrep, launchClaude,
wait for daemon to spawn, app.close(), assert daemon is gone.
Soft-skips on rows where the daemon isn't gated to spawn (most
default builds today).

== claude.ai renderer UI domain wrapper ==

New `lib/claudeai.ts` centralizes renderer-DOM discovery for
claude.ai UI patterns. Same shape as `lib/quickentry.ts` —
domain class with discovery-by-shape, atom helpers, idempotent
mocks. Exports:

  - activateTab(name) — clicks Chat/Cowork/Code df-pill
  - installOpenDialogMock + getOpenDialogCalls — idempotent
    dialog.showOpenDialog mock + recorded calls
  - findCompactPills, openPill, clickMenuItem, pressEscape —
    atoms shared by future page objects
  - class CodeTab — activate(), openEnvPill(), selectLocal(),
    openFolderPicker() (full chain)

Discovery is by structural fingerprint, not Tailwind classes
(those rebuild). Probed against a live debugger to confirm:
df-pill is exactly 3 instances (Chat/Cowork/Code), compact-pill
distinguishes env pill (max-w-[200px]) from Select-folder pill
(max-w-[160px]) — same component shape, different label widths.

T17 refactored to use the new lib — went from ~470 lines of
inline DOM walking to ~70 lines of intent. When claude.ai
re-renders the Code tab, the fix is one file over, not per-spec.

== Library brittleness fixes ==

`lib/quickentry.ts`:
  - getStoredPosition rewritten to read configDir/Claude/config.json
    directly via electron-store's known JSON shape. Replaces a
    fragile globalThis-walk that matched any object with .get/.set
    returning a quickWindowPosition value.
  - LOGIN_URL_RE anchored: `^https?://[^/]+/(login|auth|sign[-_]?in)
    (?:[/?#]|$)`. Previous unanchored form would match
    /oauth/callback as still-on-login.
  - Dropped dead `skipTaskbar: false` field from
    getPopupRuntimeProps return shape (no caller used it; the
    hardcoded false was misleading).

`lib/inspector.ts`:
  - InspectorClient.close() is now idempotent — second close is a
    no-op. Both runners and electron.ts auto-close path can safely
    invoke it.

`lib/electron.ts`:
  - ClaudeApp tracks the attached inspector internally; app.close()
    auto-closes it (existing inline inspector.close() calls in
    runners stay working idempotently).
  - Module-level activeLaunches set + signal handlers ensure
    Ctrl-C during a sweep kills tracked Electron pids and rms
    isolation tmpdirs before re-emitting the signal.
  - app.lastExitInfo: { code, signal } | null exposes non-zero
    exit info post-close. Runners can attach when nonzero;
    nothing breaks when ignored.

== Config + orchestrator ==

`playwright.config.ts`:
  - retries: process.env.CI ? 1 : 0 (one retry in CI to absorb
    compositor flake; local stays at 0 so flakes surface).
  - forbidOnly: !!process.env.CI prevents stray test.only from
    sneaking through CI.
  - /// <reference types="node" /> for `process.env` access (the
    file isn't covered by tsconfig.json's `src/**/*` include).

`orchestrator/sweep.sh`:
  - Replaces the four `grep -oP ... | head -1` lines (which read
    only the first <testsuite> element) with a Node-based summary
    that sums tests/failures/errors/skipped across every suite.
  - Wrapped in `command -v node` guard with the legacy grep
    fallback retained inline.
  - Output line is byte-identical for downstream consumers.

== Cleanup + docs ==

  - README.md status table updated: 20 specs, 13 pass on KDE-W,
    six skip cleanly per spec intent. T17 row reflects the new
    end-to-end click chain.
  - lib/claudeai.ts and probe.ts added to the Layout section.
  - Deleted _investigate_t17_urls.spec.ts (one-off diagnostic
    that confirmed T17's /login was a fresh-isolation auth
    miss, not a webContents race).
  - Kept probe.ts as the seed for the explore CLI in the
    upcoming UI-mapping plan.

== UI mapping plan ==

`docs/testing/claudeai-ui-mapping-plan.md` — executable plan
for systematically mapping claude.ai's renderer UI into reusable
test-harness abstractions. Three layers: shape-based atoms,
page objects per major surface, discovery tooling. Phase 1
(explore CLI with snapshot/diff) and Phase 2 (UI map markdown)
are independent and can run in parallel; Phase 5 (drift
detection H05) depends on Phase 1.

== Validation ==

KDE-W sweep: 13 pass, 6 cleanly skip, 0 fail. 2.7 min total.
T17 verified end-to-end via the env-pill chain after refactor.
npx tsc --noEmit clean across all changes.

---
Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
70% AI / 30% Human
Claude: dispatched five parallel agents per file partition (libs / runners batch 1 / runners batch 2 / new H-tests / config), wrote the claudeai.ts extraction agent brief informed by live-debugger probe evidence, drafted the UI mapping plan
Human: scoped which improvements to make, called out skip vs fail edges (S34 KDE-strict / GNOME-fixme), shared live-renderer DOM dumps that ground-truthed T17's click chain (Code df-pill → env pill → Local → Select folder → Open folder), validated each step
2026-05-03 07:56:29 -04:00
aaddrick
3d3653f51d test(harness): consolidate QE readiness waits behind waitForReady(level)
Six QE specs (S29-S35) hand-rolled six different shapes of "wait
until the app is ready" — some polled mainWin.getState().visible,
some additionally polled for any claude.ai webContents, some
chained waitForUserLoaded for the URL-past-/login signal. Each
spec started with a 10-20 line block of polling boilerplate.

Replaces those with a tiered helper on the ClaudeApp interface:

  app.waitForReady(level, opts?) → ReadyResultFor<level>

with four levels:
  - 'window'      — X11 window mapped (no inspector)
  - 'mainVisible' — main shell BrowserWindow.isVisible()
  - 'claudeAi'    — any claude.ai webContents reachable
  - 'userLoaded'  — claude.ai URL past /login (lHn() precondition)

Higher levels include all lower-level checks. Returns a
conditionally-typed shape per level so the inspector handle is
non-optional at 'mainVisible' or higher (no `inspector!` casts at
call sites). Single overall timeout (default 90_000ms) flows
across steps — slow startup eats from later steps' budget rather
than tripping a per-step deadline.

Hard-fail vs soft-fail split mirrors what the specs already did:

  - 'window' / 'mainVisible' throw on timeout — no spec today
    has a skip path for these, treat as hard regression.
  - 'claudeAi' / 'userLoaded' return with claudeAiUrl /
    postLoginUrl absent on timeout. Caller checks the field and
    testInfo.skip()s — the existing not-signed-in skip pattern
    in S31, S32, S35.

Migrations:

  S29, S30, S34   → 'mainVisible'
  S31, S32        → 'claudeAi'  (preserves the not-signed-in skip)
  S35 (×2 launch) → 'userLoaded' (preserves the skip on both)

Net -64 lines across the six specs (boilerplate gone) and +130
lines in lib/electron.ts (the helper + types). The trade is
worth it for the next QE-* runner — readiness becomes a single
named call instead of another bespoke poll.

Deliberately preserved:

  - openAndWaitReady's retry loop in lib/quickentry.ts. The
    lHn() race (build-reference index.js:515604) lives on a
    different timeline than the renderer URL — main-process
    user state can lag the URL change past /login. 'userLoaded'
    is necessary but not sufficient; the retry-on-shortcut path
    is the cheapest mitigation and stays.
  - S35's first-launch 3s sleep between userLoaded and the
    first openAndWaitReady. openAndWaitReady's retry would
    catch the race too, but eating one full attempt +
    retryDelayMs is slower than the upfront sleep on a test
    that already runs ~30s.

waitForUserLoaded stays exported from lib/quickentry.ts (lHn()
race domain knowledge belongs there) and is consumed by
electron.ts. No re-export to keep one canonical import path.

Validated on KDE-W: 10 passed, 5 cleanly skipped (S12/S32 row,
S36 single-monitor, S37 Linux-unreachable, T17 on /login),
2.1 minutes total. npm run typecheck clean.

---
Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <claude@anthropic.com>
60% AI / 40% Human
Claude: drafted the helper API, sorted out the conditional-type vs overload tradeoff, migrated the six specs, ran the validation sweep
Human: scoped which specs to migrate, defined the level semantics, called out openAndWaitReady's retry as untouchable, validated outcome
2026-05-03 07:56:29 -04:00
aaddrick
7d4b819a2d test(harness): land 10 Quick Entry closeout runners (S09-S37) on KDE-W
Wires up the remaining QE-* sweep runners from
docs/testing/quick-entry-closeout.md. Full sweep on KDE-W now runs
16 specs in ~2.2 min; 10 pass, 5 cleanly skip per spec intent
(S12/S32 row-gated to GNOME-W, S36 single-monitor, S37 unreachable
on Linux, T17 mid-air on selector tuning).

Specs landed:

- S09 — patch sanity (asar grep for the KDE-gate string). Pure file
  probe, no app launch, ~75ms.
- S12 — `--enable-features=GlobalShortcutsPortal` argv check.
  GNOME-W only. Currently a known-failing regression detector
  until the launcher patch lands; greens once #404 is closed.
- S29 — popup lazy-create from closed-to-tray. Verifies the popup
  webContents is null before the first shortcut, then opens.
- S30 — shortcut becomes a no-op after full app exit. Switched
  from "no leftover process" to a pgrep-pid-delta assertion; the
  spec's regression target is "no NEW pid spawned by the
  shortcut," not "zero leftovers" (renderer/zygote teardown is
  asynchronous, not what S30 is testing).
- S31 — pre-existing; updated to use openAndWaitReady().
- S32 — GNOME-W/Ubu-W variant of S31 with a main-reappears
  assertion that S31 explicitly avoids. Skips on KDE rows; will
  fail on GNOME-W until the stale-isFocused() patch is widened
  beyond the current KDE-only #406 gate.
- S33 — bundled Electron version. Reads from
  `electron/package.json` rather than running `electron --version`
  (the bundled binary auto-loads `resources/app.asar` so `--version`
  gets passed through as argv to Claude Desktop instead of being
  intercepted by Electron's flag parser).
- S34 — fullscreen main suppresses popup. Inverse-shape test:
  popup must NOT be visible within 3s of the shortcut.
- S35 — position memory across app restart. Two-launch test
  using a shared isolation handle so XDG_CONFIG_HOME persists
  across the restart. Heaviest runner (~30s).
- S36 — multi-monitor fallback. Skips with `-` on single-monitor
  hosts per the closeout spec; uses test.fixme() on multi-monitor
  hosts to surface the missing libvirt-detach orchestration as
  `?` (untested) rather than a misleading green.
- S37 — main-window destroy. Documented skip — unreachable on
  Linux per the close-to-tray override. Marked `-` on every
  Linux row in the matrix.

Two race conditions surfaced and fixed during the bring-up:

1. **lHn() user-loaded race.** Upstream's shortcut handler
   (build-reference index.js:515604) checks `!user.isLoggedOut`
   AFTER ready-to-show and silently skips Ko.show() if the
   main-process user object hasn't populated yet. URL-changes-past-
   /login (visible in the renderer) precedes user-object population
   (in the main process). Mitigation: a new `openAndWaitReady()`
   helper that retries the shortcut up to 3 times with a
   per-attempt timeout. Used by S29-S32, S35.
2. **Main-visible-then-trigger race.** Triggering the shortcut
   immediately after the X11 window appears races the popup show()
   flow on first invocation. Mitigation: wait for
   `mainWin.getState().visible === true` before the first shortcut
   call. The same wait fixes the in-process case where lHn() was a
   non-issue.

New harness primitive:

- `waitForUserLoaded(inspector, timeoutMs)` in lib/quickentry.ts —
  polls the claude.ai webContents URL until it's no longer on a
  /login or /auth path. The signal is necessary but not sufficient
  for the lHn() race (auth state has its own timeline), so the
  retry-loop in `openAndWaitReady()` does the actual heavy lifting.

README's Status table updated to list all 16 specs, layout
section adds the 10 new runner files.

---
Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <claude@anthropic.com>
35% AI / 65% Human
Claude: drafted runners + helpers, traced lHn() race through build-reference, debugged race conditions iteratively against the local install
Human: scoped batches, validated each runner outcome, drove the diagnostic-attachment + retry-vs-sleep tradeoff decisions
2026-05-03 07:56:29 -04:00
aaddrick
e92ca9895a test(harness): foundation for QE-* runners + S31 passing on KDE-W
Three prerequisites built before adding the closeout sweep runners:

- Per-test isolation default in launchClaude(). Fresh
  XDG_CONFIG_HOME / CLAUDE_CONFIG_DIR per launch via mkdtemp,
  cleaned up on close. Three modes: default (fresh), shared
  (pass an Isolation handle for restart-style tests like S35),
  null (host config — opt-in for tests that need real claude.ai
  auth via CLAUDE_TEST_USE_HOST_CONFIG).
- Row-skipping primitive (skipUnlessRow) so spec files declare
  applicability once and the orchestrator routes correctly. Maps
  to JUnit <skipped> → matrix `-`.
- Layered Critical/Should assertion pattern. Local signals stay
  local (popup-closed = isVisible() === false), network-coupled
  signals (chat URL nav) are tracked separately so a claude.ai
  hiccup doesn't fail a regression cell.

New libs:
- isolation.ts — per-test sandbox
- row.ts — skipUnlessRow / skipOnRow
- argv.ts — /proc/$pid/cmdline + flag-presence check (QE-6, S07,
  S12, future Wayland-default Smoke)
- asar.ts — in-place app.asar reads via @electron/asar (QE-19,
  future patch sanity for tray.sh / cowork.sh / etc.)
- quickentry.ts — domain wrapper. Single point of coupling to
  upstream's main-process structure for QE-* tests. Anchors on
  stable strings (loadFile path '.vite/renderer/quick_window/
  quick-window.html', IPC channel names, settings keys), not
  minified vars.

S31 — Quick Entry submit reaches new chat from any main-window
state. Backs QE-7/8/9; passes on KDE-W in ~28s.

The interceptor pivot worth noting: scripts/frame-fix-wrapper.js
returns the electron module wrapped in a Proxy whose `get` trap
returns a closure-captured PatchedBrowserWindow. Constructor-level
wraps (`electron.BrowserWindow = Wrapped`) are silently bypassed —
writes succeed but reads ignore them. The reliable hook is at the
prototype-method level (loadFile / loadURL); captures every
instance regardless of subclass identity. Documented in
docs/learnings/test-harness-electron-hooks.md so the next
contributor doesn't re-discover the trap.

ydotool is a hard prerequisite for QE-* shortcut injection.
README's "Quick Entry runners" section walks through one-time
host setup (install + ydotoold systemd override for a
world-writable socket). sweep.sh fast-fails with a clear
diagnostic when the daemon isn't reachable.

What's left: ten more runners (S29/S30/S32/S33/S34/S35/S36/S37,
QE-6/19 patch sanity, QE-15/17/21 popup chrome). Each is a
~30-60-line recombination over the existing libs — see plan in
the closing message of this PR thread.

---
Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <claude@anthropic.com>
40% AI / 60% Human
Claude: drafted libs + runner, debugged the frame-fix-wrapper Proxy trap, wrote the learnings entry, ran S31 on bare-metal KDE-W
Human: scoped the prerequisites split, ran ydotool/ydotoold setup, validated the output, drove design tradeoffs (per-test isolation default, layered Critical/Should assertion, prototype-hook over constructor wrap)
2026-05-03 07:56:29 -04:00
aaddrick
bf9082067a docs(testing): add Quick Entry closeout sweep plan + S29-S37 case specs
Focused sweep plan for closing #393 / #404 / #370, anchored in upstream
design intent rather than user expectation (validated against
build-reference/.vite/build/index.js).

Adds nine functional test specs (S29-S37) covering Quick Entry popup
lifecycle, submit-flow reachability across main-window states, the
fullscreen edge case, position memory across restart, multi-monitor
fallback, and popup-survives-main-destroy behaviour. Each spec cites
specific upstream file:line evidence.

Refines ui/quick-entry.md rows with the same upstream evidence and adds
rows for popup lifecycle and main-window-destroy persistence. Submit
transition row now reflects "always a new chat session, never appended
to current" per index.js:515546.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:55:51 -04:00
aaddrick
c97d9eb64e docs(testing): update README + runbook for landed automation
The README's "Automation roadmap" section was written when the harness
didn't exist; it described automation in the future tense. Same for the
runbook's "Eventual automation" section ("runner: fields are
aspirational"). Both lied as of last week.

  README "Automation status" — points at tools/test-harness/, lists the
                               four wired runners (T01/T03/T04/T17),
                               links automation.md for architecture,
                               links runbook for invocation.
  runbook "Automated runs"   — sweep.sh invocation, output paths,
                               JUnit-to-matrix mapping, coexistence
                               with manual tests, brief on the
                               SIGUSR1 / runtime-attach path through
                               the CDP gate (with link to the long
                               writeup in automation.md).

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:55:51 -04:00
aaddrick
bfc0c0378e test(harness): runtime-attach inspector via SIGUSR1 unblocks L1
The CDP gate (lib/electron.ts) only matches --remote-debugging-port /
-pipe on argv. It doesn't check --inspect or runtime SIGUSR1 — which is
the same code path as the in-app Developer → Enable Main Process
Debugger menu item. Spotted by aaddrick.

So we spawn Electron clean (gate stays asleep), wait for the X11
window, then send SIGUSR1 to attach the Node inspector at runtime.
From there we get main-process JS evaluation, which reaches the
renderer via webContents.executeJavaScript() and supports main-process
mocks (dialog.showOpenDialog for T17).

What landed:

  src/lib/inspector.ts   — new. WebSocket Node-inspector client with
                           evalInMain<T>() and evalInRenderer<T>()
                           wrappers. Node 22+ built-in WebSocket; no
                           extra deps.
  src/lib/electron.ts    — adds app.attachInspector(timeoutMs) which
                           SIGUSR1's the pid and waits for port 9229
                           to answer.
  src/runners/T17        — re-enabled. Inspector attaches, dialog mock
                           installs, claude.ai webContents found,
                           Code-tab navigation click succeeds. Skips
                           with rich diagnostic if the folder-picker
                           click chain doesn't land — selector tuning
                           is iterate-as-needed work, not a blocker.

Two implementation gotchas captured in code comments:

  - BrowserWindow.getAllWindows() returns 0 because frame-fix-wrapper
    substitutes the class and breaks the static registry. Use
    webContents.getAllWebContents() instead — works correctly.
  - Runtime.evaluate's awaitPromise + returnByValue returns empty
    objects for awaited Promise resolutions. Workaround: IIFE returns
    JSON.stringify(value) and caller JSON.parses.

Sweep output:

  $ ./orchestrator/sweep.sh
  ✓  T01 — App launch (7.2s)
  ✓  T03 — Tray icon present (7.2s)
  ✓  T04 — Window decorations draw (7.1s)
  -  T17 — Folder picker opens
  3 passed, 1 skipped (44s)

Decision 1's escape-hatch reasoning (dogtail / AT-SPI) is no longer the
fallback for L1; it's only relevant for native dialogs the inspector
pattern can't reach. The three documented escape hatches under "The CDP
auth gate" can be retired — option (4), runtime-attach, is what we
actually use.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:55:51 -04:00
aaddrick
d5d7081b35 test(harness): pivot off CDP, ship 3 passing tests on KDE-W
Discovered the real blocker behind every failed Playwright launch: the
shipped index.pre.js has an authenticated-CDP gate.

  uF(process.argv) && !qL() && process.exit(1);

uF matches --remote-debugging-port / --remote-debugging-pipe on argv;
qL validates an ed25519-signed token in CLAUDE_CDP_AUTH (signed payload
${timestamp_ms}.${base64(userDataDir)}, 5-minute TTL) against a hardcoded
public key. Without a valid signature the app exits with code 1 right
after frame-fix-wrapper completes.

Both _electron.launch() and chromium.connectOverCDP() inject
--remote-debugging-port=0 and trigger the gate. The signing key is held
upstream; we can't forge tokens. CDP-driven L1 testing is blocked until
one of: (a) upstream issues a test/CI token, (b) we carry an
app-asar.sh patch that neutralizes the gate, or (c) we drive the
renderer via accessibility (dogtail / AT-SPI). All three are real
options; none belong in this commit.

What ships here, working today:

  T01 — App launch                 ✓ on KDE-W
  T03 — Tray icon present          ✓ on KDE-W (already was)
  T04 — Window decorations draw    ✓ on KDE-W (already was)
  T17 — Folder picker opens        - (skipped, awaits portal mock v2)

The harness now spawns Electron without any debug-port flags and
probes the running app externally — xprop for window state, dbus-next
for tray. T01 verifies "an X11 window with our pid appears within 15s
and its title matches /claude/i" rather than reading navigator.userAgent;
T03/T04 were external-probe tests already.

Sweep output:

  $ ROW=KDE-W ./orchestrator/sweep.sh
  Running 4 tests using 1 worker
    ✓  1 T01 — App launch (7.2s)
    ✓  2 T03 — Tray icon present (7.2s)
    ✓  3 T04 — Window decorations draw (7.1s)
    -  4 T17 — Folder picker opens
    1 skipped
    3 passed (22.9s)
  summary: tests=4 failures=0 errors=0 skipped=1

JUnit XML written, .tar.zst bundle created, exit 0.

The CDP auth gate finding is documented at docs/testing/automation.md
"The CDP auth gate" with the three escape hatches enumerated. Decision 1
and Decision 5 reopen for L1 once the project picks a path.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:55:51 -04:00
aaddrick
46f6dcdb9d test(harness): findings from first KDE-W run-through
Captures four real issues surfaced by trying to run T01 against the
installed claude-desktop on Nobara KDE-W, plus the fixes that landed.

Fixes that stuck:

1. Bypass the launcher script (/usr/bin/claude-desktop). It redirects
   Electron's stdout/stderr to ~/.cache/claude-desktop-debian/launcher.
   log, which means Playwright can't read the CDP advertisement on
   stderr. launchClaude now resolves the Electron binary + app.asar
   directly and spawns through Playwright. Override paths via
   CLAUDE_DESKTOP_ELECTRON / CLAUDE_DESKTOP_APP_ASAR env vars.

2. Inject the launcher's flags. Decision 6 (X11 default) is enforced
   in production via --disable-features=CustomTitlebar
   --ozone-platform=x11. Without these, Electron 41 hits a fatal
   Wayland communication error ("Broken pipe") on this build. Added
   as LAUNCHER_INJECTED_FLAGS.

3. Inject the launcher's env. ELECTRON_FORCE_IS_PACKAGED=true and
   ELECTRON_USE_SYSTEM_TITLE_BAR=1 mirror setup_electron_env(). The
   former makes app.isPackaged return true so resource resolution
   uses process.resourcesPath; the latter matches hybrid/native
   titlebar modes.

4. Pre-launch cleanup. Mirrors cleanup_orphaned_cowork_daemon +
   cleanup_stale_lock + cleanup_stale_cowork_socket in launcher-common
   .sh. Without it, a previous failed run leaves an orphaned cowork
   daemon and a stale SingletonLock that poison the next launch.

Also: dropped the xdotool dependency. wm.ts now finds the X11 window
by walking _NET_CLIENT_LIST + _NET_WM_PID via xprop only, which is
universally installed where xdotool isn't.

Open finding documented in README "Known limitations":

  Playwright's _electron.launch() currently fails after Frame Fix
  completes — the Node-inspector ws disconnects (code 1006) before
  the renderer ever advertises its DevTools port. Standalone
  electron --inspect=0 ... app.asar runs cleanly with the same flags
  (Frame Fix → "Starting app" → window created), so the failure is
  specific to Playwright + Electron 41 + this build. Likely
  workarounds: (a) chromium.connectOverCDP() against externally-
  spawned Electron with fixed --remote-debugging-port; (b) skip L1
  entirely for T03/T04 (those don't need Playwright owning the
  process — just spawn via child_process and use dbus-next / xprop).

Type-check passes; orchestrator/sweep.sh runs cleanly. The four .spec
.ts files all discover via npx playwright test --list. The blocker
is the launch handshake, not the harness shape.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:55:51 -04:00
aaddrick
f8ba761c2e test(harness): scaffold first vertical slice — T01, T03, T04, T17
Adds the in-VM TS harness at tools/test-harness/ covering the four
tests that exercise every distinct shape of harness code:

- T01 — app launch (playwright-electron)
- T03 — tray icon present (dbus-next + StatusNotifierWatcher)
- T04 — window decorations draw (xprop + xdotool shell-out helpers)
- T17 — folder picker opens (Electron-level dialog intercept; v1)

Layout:

    tools/test-harness/
    ├── package.json / tsconfig / playwright.config
    ├── src/lib/         — electron, dbus, sni, wm, env, retry, diagnostics
    ├── src/runners/     — one .spec.ts per test ID
    └── orchestrator/sweep.sh

Per Decision 1 (single-language TS): every runner is .ts; OS tools
(xprop, xdotool, claude-desktop --doctor) are shelled out via
child_process and wrapped as typed TS helpers. dbus-next handles all
DBus introspection. No bash test scripts, no Python.

T17 is the shallow v1 — intercepts dialog.showOpenDialog at the
Electron main process via Playwright's app.evaluate() rather than
mocking the portal. Mocking org.freedesktop.portal.FileChooser via
dbus-next requires displacing the running portal service or running
under dbus-run-session, both intrusive enough to defer until signal
warrants it. The test file documents this and the upgrade path.

T04 uses xprop / xdotool which work on X11 native and KDE Wayland
(via XWayland — the project default per Decision 6). Native-Wayland
window-state queries are deferred.

Wires runner: fields into the four cases/*.md test specs.

Type-check passes; npx playwright test --list discovers all four.

Run with:
    cd tools/test-harness
    npm install
    ROW=KDE-W ./orchestrator/sweep.sh

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:55:51 -04:00
aaddrick
47de8bff7d docs(testing): unify on TS, capture decisions
Restructures automation.md from brainstorm-with-open-questions to
direction-with-residual-decisions. Eight calls captured in a Decisions
table near the top:

1. Single language (TypeScript). dbus-next replaces gdbus shell-outs;
   child_process wraps OS-tool invocations as typed TS helpers; portal
   mocking via dbus-next handles native-dialog tests. Python only as a
   last-resort escape hatch for AT-SPI cases that resist mocking.
2. Harness lives at tools/test-harness/.
3. Packer for imperative distro images + Nix flake for Hypr-N.
4. No CI infrastructure initially; harness invokable from CI but
   sweeps run from the dev box for the first ~20 tests.
5. Semantic locators only (getByRole/getByLabel/getByText). No
   proactive data-testid injection patch; escalate per-test if a
   selector proves unstable.
6. X11-default verification is Smoke; Wayland-native characterization
   is Should. Project keeps X11 default because portal coverage for
   GlobalShortcuts is uneven across compositors.
7. Last 10 greens + all reds, on main only. Capture --doctor /
   launcher log / screenshot every run.
8. JUnit lives as workflow-run artifacts. Matrix-regen reads latest
   run's bundle and PRs the matrix update.

T17 (folder picker) moves out of "manual forever" — portal mocking
covers the integration test cleanly. dogtail demoted to escape-hatch
status, only invoked if a specific test forces it.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:55:51 -04:00
aaddrick
28fc6e29a2 docs(testing): draft automation plan
Captures the brainstorm + research pass behind the eventual harness:
three-layer model (renderer / native / manual), why in-VM Playwright
beats orchestrator-driven CDP, toolchain choices per layer (playwright-
electron, dogtail/AT-SPI, ydotool→libei), anti-patterns to design
against from day one, and a suggested first vertical slice (KDE-W + T01).

Includes an Open questions section listing eight decisions still owed
before any of this becomes code — language split, harness location,
image-build tooling, CI execution model, data-testid injection, severity
for the Electron-Wayland-default tests, diagnostic retention, JUnit
output destination.

Sourced; not committed direction yet.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:55:51 -04:00
aaddrick
ff3dd3c64e docs(testing): add Linux compatibility test plan
Establish a manual test plan for the Linux fork at docs/testing/, structured
to support eventual automation.

Layout:
- README.md         orientation, severity tiers, smoke set (10 tests),
                    automation roadmap
- matrix.md         cross-env dashboard (T01-T39) + env-specific status
                    snapshots (S01-S28) + known-failures rollup
- runbook.md        VM setup, diagnostic-capture commands, sweep workflow,
                    severity guidance, how to add tests
- cases/            67 functional tests grouped by feature surface; every
                    test has standardized Severity / Steps / Expected /
                    Diagnostics on failure / References sections
- ui/               per-surface UI checklists (window chrome, tray,
                    sidebar, prompt, code-tab panes, settings, routines,
                    connectors/plugins, quick entry, notifications). Every
                    row is an interactive element with selector + expected
                    state.

Coverage:
- Historical project surfaces: app launch, doctor, tray, window
  decorations, hybrid topbar, Quick Entry, autostart, hide-to-tray,
  multi-instance.
- Upstream Claude Code Desktop surfaces (officially "Linux not supported"
  per code.claude.com/docs/en/desktop): Code tab, sign-in flow, folder
  picker, drag-drop, integrated terminal, file pane, preview pane, PR
  monitoring, scheduled tasks, connectors OAuth, plugin browser, MCP /
  hooks / CLAUDE.md memory, Dispatch handoff.
- Env-specific failure modes: Ubuntu/DEB, Fedora/RPM, Wayland-native
  (wlroots), KDE, GNOME (mutter XWayland key-grab), Omarchy, Niri,
  AppImage, .desktop env handling, idle-sleep / suspend, Computer Use
  (out-of-scope per upstream), auto-update vs apt/dnf, plugin/worktree
  storage.

Automation hooks:
- Stable T## / S## test IDs (won't move).
- Standardized test bodies — Steps and Diagnostics fields are
  scripted-runner-shaped.
- UI checklists are per-element tables — every row a candidate
  Playwright / xdotool / DBus assertion.
- Smoke set explicit in README — first 10 tests for automation.

Co-Authored-By: Claude <claude@anthropic.com>
2026-05-03 07:55:51 -04:00
162 changed files with 38105 additions and 0 deletions

4
.gitignore vendored
View File

@@ -33,3 +33,7 @@ result-*
# Wrangler (Cloudflare Worker dev/deploy cache)
worker/.wrangler/
# UI snapshots — captured renderer state, intentionally ignored to avoid
# diff churn. See docs/testing/ui-snapshots/README.md.
docs/testing/ui-snapshots/*.json

View File

@@ -15,6 +15,8 @@ The [`docs/learnings/`](docs/learnings/) directory contains hard-won technical k
- [`tray-rebuild-race.md`](docs/learnings/tray-rebuild-race.md) — why destroy + recreate on `nativeTheme` updates briefly duplicates the tray icon on KDE Plasma, and the in-place `setImage` + `setContextMenu` fast-path that avoids the SNI re-registration race
- [`mcp-double-spawn.md`](docs/learnings/mcp-double-spawn.md) — Stdio MCPs spawn 2× when chat and Code/Agent panels are both active, root cause in upstream session managers, MCP-author workaround
- [`linux-topbar-shim.md`](docs/learnings/linux-topbar-shim.md) — why claude.ai's in-app topbar is missing on Linux, the four gates that hide it, why the upstream `frame:false` + WCO config has unclickable buttons on X11 (Chromium-level implicit drag region), and the resolution: hybrid mode (system frame + UA-spoof shim → stacked layout, full button functionality)
- [`test-harness-electron-hooks.md`](docs/learnings/test-harness-electron-hooks.md) — why constructor-level `BrowserWindow` wraps are silently bypassed by `frame-fix-wrapper`'s Proxy, and the prototype-method hook pattern that works (used by the Quick Entry test runners)
- [`test-harness-ax-tree-walker.md`](docs/learnings/test-harness-ax-tree-walker.md) — five non-obvious traps in the v7 fingerprint walker after the AX-tree migration: AX-enable async lag, navigateTo-to-same-URL no-op, claude.ai's flat `dialog>button[]` lists, the `more options for X` per-row shape, and sidebar virtualization vs the lookup-failure threshold
## Code Style

View File

@@ -0,0 +1,134 @@
# Test-harness AX-tree walker — non-obvious traps
Notes from the v6 → v7 fingerprint migration that switched
`tools/test-harness/explore/walker.ts` from a renderer-side
`document.querySelectorAll` IIFE to Chromium's accessibility tree
(`Accessibility.getFullAXTree` over CDP). All five gotchas below cost
a wasted live-walk to find; capturing them here so the next person
debugging a 0-entry inventory or a redrive cascade can skip the
discovery loop.
## 1. `Accessibility.enable` is async; the first `getFullAXTree` lies
Inspector clients call `target.debugger.sendCommand('Accessibility.enable')`
before the first `getFullAXTree`. Both calls return immediately, but
Chromium populates the AX tree asynchronously — the very first
read can return a tree containing only the `RootWebArea` and a
generic shell (4 nodes total) even when the DOM has hundreds of
interactive elements. The walker's existing `waitForStable` is a
DOM-mutation-quiescence observer with a 1.5s ceiling; on claude.ai's
SPA the DOM mutates constantly so `waitForStable` returns at the
ceiling without the AX tree ever catching up.
**Fix:** `waitForAxTreeStable` polls `getFullAXTree` until two
consecutive reads return the same node count. Called once before the
seed snapshot (with `minNodes: 20` to gate against the 4-node "still
loading" case), once after each `navigateTo` in `redrivePath`, and
baked into every `snapshotSurface` call (with `minNodes: 1` for the
post-click case where the tree is already populated).
**Symptom you'll see:** seed entries: 0. Walker exits with no
inventory. Stderr says `walker: AX tree settled at 4 nodes` (or
similar small number).
## 2. `navigateTo(sameUrl)` is a no-op; redrives carry prior state
The walker's `navigateTo(url)` short-circuits when `currentUrl === url`
(per the original v6 implementation). Every BFS pop re-navigates
to `startUrl` to replay the recorded path against a clean state, but
when `currentUrl` already matches `startUrl` the navigation is
skipped. Anything a prior drill left behind — open dialog, expanded
sidebar, scrolled focus, route params — carries into the next
redrive's snapshots. `clickById` then suffix-matches the requested
fingerprint against a contaminated surface and silently fails to find
elements that were absolutely on the seed surface.
**Fix:** `redrivePath` uses `reloadPage(inspector)` (which evals
`location.reload()` in the renderer) instead of
`navigateTo(startUrl)`. The reload discards the React tree and forces
a fresh mount even when the URL matches.
**Symptom you'll see:** the first one or two BFS items succeed, then
every subsequent redrive fails with
`clickById: no element matches "<seed-id>" on current surface`. The
`<seed-id>` is a button you can verify with the DevTools console is
visibly present.
## 3. claude.ai uses flat `dialog>button[]` and `complementary>button[]`, not `role=list`
The v7 plan's `isListRowChild` check assumes list rows use ARIA list
semantics (`option/listitem` inside `listbox/list`). claude.ai
exposes the connect-apps marketplace as a `dialog` with ~80 plain
`button` children (no `list` wrapper) and the cowork sidebar as a
`complementary` landmark with ~70 plain `button` children. Without
the heuristic those buttons literal-match by name → each gets a
unique stable entry → the BFS queues each individually for drilling
→ inventory bloats from 32 to 442+ entries and most drills fail
because the per-row buttons are virtualized.
**Fix:** `isListRowChild` extended in two ways. (a) `LIST_ROW_ROLES`
includes `button`, `LIST_ANCESTOR_ROLES` includes `group`. (b) A
sibling-count fallback fires when `siblingTotal >= 15` regardless of
ancestor role — sits well above realistic toolbar sizes (≤10) and
well below the smallest claude.ai marketplace (~80). Step 3
(positional fallback) also gates on `!isListRowChild` so list rows
fall through to step 4's `instance` collapse instead of fragmenting
into per-index positionals that can't fold.
**Symptom you'll see:** dialog kind count balloons (>200). One surface
dominates the `surfaceBreakdown` query in the inventory. Each
marketplace card or sidebar row gets its own `kind: structural`
entry with a slugified product name in the id-tail.
## 4. The `more options for X` per-row trigger needs its own shape
Cowork sidebar rows have a "⋮" menu next to each session whose
aria-label is `More options for <session title>`. These don't match
the `cowork-session` shape (which gates on status prefix), so even
after `cowork-session` collapsed the session list, the sibling
"More options for" buttons still emitted individually. Same for any
future per-row action button claude.ai adds.
**Fix:** new `INSTANCE_SHAPES` entry `row-more-options` with regex
`/^More options for /` and matching pattern. Generic enough to cover
any per-row trigger that follows the `<verb> for <row title>` shape.
**Symptom you'll see:** after fixing (1)-(3), a fresh wave of
redrive failures all matching `more-options-for-X` slugs.
## 5. Sidebar virtualization causes structural redrive misses; bump the threshold
claude.ai's cowork sidebar appears to virtualize the session list:
each fresh page load exposes a slightly different subset of sessions
in the AX tree (subset, not just ordering — actually different
membership). The walker captures session N at seed time but on
redrive after `reloadPage` session N may not be in the tree. Each
miss counts toward `MAX_CONSECUTIVE_LOOKUP_FAILURES`, and a stretch
of 25+ consecutive cowork-row redrives can blow through the original
threshold without the renderer being meaningfully wedged.
**Fix:** threshold bumped 25 → 75. The timeout counter (still 5
strikes) gates against actual renderer hangs; the lookup-failure
counter is more about "discovered DOM has drifted from seed", and on
a virtualized list a generous threshold is correct. Subtree pruning
(already in place) keeps the bursts from compounding by dropping
queue items whose path shares the failed step's prefix.
**Symptom you'll see:** the walker aborts mid-walk with
`25 consecutive redrive lookup failures` and the failed ids all
share a common ariaPath prefix (`root.complementary.button-by-name.X`).
## Driver: prefer `walk-isolated.ts` over `explore walk`
`npm run explore:walk` connects to whatever Node inspector is on
:9229 — i.e. the host Claude Desktop the user is currently using.
That mutates the host profile (visited surfaces, navigation history,
route changes) and races with the human at the keyboard.
`tools/test-harness/explore/walk-isolated.ts` mirrors what H05 / U01
do: kills any running host instance, copies auth into a tmpdir
(`createIsolation({ seedFromHost: true })`), spawns a fresh Electron
with isolated `XDG_CONFIG_HOME`, attaches the inspector via
`SIGUSR1`, runs the walk, tears down. Same flag set as
`explore walk` plus `--no-seed` for the rare case you want a
fresh-sign-in run. Use it.

View File

@@ -0,0 +1,99 @@
# Hooking Electron from the test harness
Why constructor-level `BrowserWindow` wraps don't work in this
codebase, and the prototype-method hook that does.
## TL;DR
The test harness attaches a Node inspector at runtime (see
[`docs/testing/automation.md`](../testing/automation.md#the-cdp-auth-gate-and-the-runtime-attach-workaround-that-beats-it))
and from there can evaluate arbitrary JS in the main process. To
observe BrowserWindow construction (e.g. find the Quick Entry popup
ref, capture construction-time options), the natural-feeling
approach is to wrap `electron.BrowserWindow`:
```js
const electron = process.mainModule.require('electron');
const Orig = electron.BrowserWindow;
electron.BrowserWindow = function(opts) {
// record opts...
return new Orig(opts);
};
```
**This is silently bypassed.** `scripts/frame-fix-wrapper.js`
returns the electron module wrapped in a `Proxy`; the Proxy's
`get` trap returns a closure-captured `PatchedBrowserWindow`
class. Reads of `electron.BrowserWindow` go through the trap and
always return `PatchedBrowserWindow`, regardless of what was
written to the underlying module. Writes succeed (Reflect.set on
the target) but reads ignore them. Upstream code calling
`new hA.BrowserWindow(opts)` constructs from `PatchedBrowserWindow`,
your wrap is never invoked, your registry stays empty.
The reliable hook is at the **prototype-method level**:
```js
const proto = electron.BrowserWindow.prototype;
const origLoadFile = proto.loadFile;
proto.loadFile = function(filePath, ...rest) {
// every BrowserWindow instance reaches this, regardless of
// which subclass constructed it
return origLoadFile.call(this, filePath, ...rest);
};
```
This is what `tools/test-harness/src/lib/quickentry.ts:installInterceptor`
does.
## Why prototype-level works through the Proxy
`electron.BrowserWindow` returns `PatchedBrowserWindow`, which
`extends` the original `BrowserWindow` class. Both share the
underlying Electron-native prototype chain via `extends`. Setting
`PatchedBrowserWindow.prototype.loadFile = wrappedFn` shadows the
inherited method on every instance — `Patched`-constructed,
frame-fix-constructed, plain. There's no Proxy in front of
`PatchedBrowserWindow.prototype`, so the assignment sticks and is
visible to all subsequent `instance.loadFile(...)` calls.
`loadFile` and `loadURL` are reasonable identification points
because every BrowserWindow that displays content calls one of
them shortly after construction. The file path / URL is a stable
upstream-controlled string (no minification — these are file paths
to bundle assets), making it a durable identifier across releases.
## Why constructor-level *can* work elsewhere
If frame-fix-wrapper is removed (or stops returning a Proxy), the
naïve constructor wrap would work. Watch for this: an upstream
fork that adopts `BaseWindow` over `BrowserWindow`, or a
build-time replacement of frame-fix-wrapper, would change the
hook surface. The prototype-method approach survives both.
## What can't be observed at the prototype level
Construction-time options (`transparent: true`, `frame: false`,
`skipTaskbar: true`, etc.) are consumed by the native side
during `super(options)` and not stored on the instance in a
reflective form. The harness reads runtime equivalents instead:
- `transparent``getBackgroundColor() === '#00000000'`
- `frame: false``getBounds().width === getContentBounds().width`
(frameless windows have equal frame and content bounds)
- `alwaysOnTop``isAlwaysOnTop()` (note: the popup sets this
via `setAlwaysOnTop()` *after* construction at
`index.js:515399`, so this is the only viable read regardless of
hook approach)
`skipTaskbar` has no public getter; if a test needs it, capture
it at the prototype level by hooking a method that takes the same
options shape, or accept that this signal is unobservable
post-construction.
## See also
- [`tools/test-harness/src/lib/quickentry.ts`](../../tools/test-harness/src/lib/quickentry.ts) — `installInterceptor()` worked example
- [`scripts/frame-fix-wrapper.js`](../../scripts/frame-fix-wrapper.js) — the Proxy + closure
- [`tools/test-harness/src/lib/inspector.ts`](../../tools/test-harness/src/lib/inspector.ts) — how the harness gets main-process JS access in the first place
- [`docs/testing/automation.md`](../testing/automation.md) — overall harness architecture

112
docs/testing/README.md Normal file
View File

@@ -0,0 +1,112 @@
# Linux Compatibility Testing
*Last updated: 2026-05-03*
This directory holds the manual test plan for the Linux fork of Claude Desktop. The structure is designed for human readers today and scripted runners tomorrow.
## Layout
| Folder / file | Purpose |
|---------------|---------|
| [`matrix.md`](./matrix.md) | **The dashboard.** Cross-environment results table + per-section env-specific status snapshots. Single source of truth for test status. |
| [`runbook.md`](./runbook.md) | How to run a sweep: VM setup, diagnostic capture, status update workflow, severity guidance. |
| [`cases/`](./cases/) | Functional test specs grouped by feature surface. Stable IDs: `T###` cross-env, `S###` env-specific. |
| [`ui/`](./ui/) | UI element inventory. Per-surface checklists — every interactive element with expected state. |
## Environment key
| Abbrev | Distro | DE | Display server |
|--------|--------|-----|----------------|
| KDE-W | Fedora 43 | KDE Plasma | Wayland |
| KDE-X | Fedora 43 | KDE Plasma | X11 |
| GNOME | Fedora 43 | GNOME | Wayland |
| Ubu | Ubuntu 24.04 | GNOME | Wayland |
| Sway | Fedora 43 | Sway | Wayland (wlroots) |
| i3 | Fedora 43 | i3 | X11 |
| Niri | Fedora 43 | Niri | Wayland (wlroots) |
| Hypr-O | OmarchyOS | Hyprland | Wayland (wlroots) |
| Hypr-N | NixOS | Hyprland | Wayland (wlroots) |
Status legend: `✓` pass · `✗` fail · `🔧` mitigated · `?` untested · `-` N/A
Cells include linked issue/PR numbers when relevant — e.g. `✗ #404` or `🔧 #406`. A bare `✗` means the failure is verified but no tracking issue is filed yet.
## Severity tiers
Each test is tagged with one of:
| Tier | Meaning | Sweep cadence |
|------|---------|---------------|
| **Smoke** | Release-gate. Must pass before any tag is cut. | Every release tag, on KDE-W + one wlroots row |
| **Critical** | Regression-blocker. Failure on any supported environment blocks the release. | Every release tag, on every active row |
| **Should** | Important but not blocking. Track as bugs, fix before next stable. | Quarterly + on demand |
| **Could** | Edge cases, nice-to-have. | On demand only |
## Smoke set
The minimum set that gates a release. Run on **KDE-W** (daily-driver) plus **Hypr-N** (clean wlroots). Sweep target: ~20 minutes.
| ID | Surface | One-line check |
|----|---------|----------------|
| [T01](./cases/launch.md#t01--app-launch) | Launch | App opens; main window renders within ~10s |
| [T03](./cases/tray-and-window-chrome.md#t03--tray-icon-present) | Tray | Tray icon appears; click toggles window |
| [T04](./cases/tray-and-window-chrome.md#t04--window-decorations-draw) | Window | OS-native frame draws and responds |
| [T05](./cases/shortcuts-and-input.md#t05--url-handler-opens-claudeai-links-in-app) | Input | `xdg-open https://claude.ai/...` opens in-app |
| [T07](./cases/tray-and-window-chrome.md#t07--in-app-topbar-renders--clickable) | Window | Hybrid topbar renders, every button clicks |
| [T08](./cases/tray-and-window-chrome.md#t08--hide-to-tray-on-close) | Window | Close button hides to tray, doesn't quit |
| [T11](./cases/extensibility.md#t11--plugin-install-anthropic--partners) | Extensibility | Anthropic & Partners plugin install completes |
| [T15](./cases/code-tab-foundations.md#t15--sign-in-completes-via-browser-handoff) | Auth | Sign-in completes via `xdg-open` browser handoff |
| [T16](./cases/code-tab-foundations.md#t16--code-tab-loads) | Code tab | Code tab loads (no 403, no blank screen) |
| [T17](./cases/code-tab-foundations.md#t17--folder-picker-opens) | Code tab | Folder picker opens via portal/native chooser |
## Test corpus snapshot
| Bucket | Count |
|--------|-------|
| Cross-environment functional (`T###`) | 39 |
| Environment-specific functional (`S###`) | 37 |
| UI surfaces inventoried | 10 |
| Total functional tests | 76 |
For detailed status by ID, see [`matrix.md`](./matrix.md).
## Automation status
Automation is partially landed. The harness lives at
[`tools/test-harness/`](../../tools/test-harness/) — twenty Playwright
specs wired (T01, T03, T04, T17, S09, S12, S29-S37, plus four H-prefix
self-tests), thirteen passing on KDE-W and six skipping cleanly per
spec intent. See [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
for the live status table, [`automation.md`](./automation.md) for
architectural decisions, and the SIGUSR1 / runtime-attach pattern that
bypasses the app's CDP auth gate.
### Grounding sweep + probe
Separate from the test sweep:
[`runbook.md` "Grounding sweep"](./runbook.md#grounding-sweep) covers
the workflow for verifying case docs themselves against the live
build on every upstream version bump — static anchor pass plus a
runtime probe ([`tools/test-harness/grounding-probe.ts`](../../tools/test-harness/grounding-probe.ts))
that captures IPC handler registry, accelerator state, autoUpdater
gate, AX-tree fingerprint, and other claims static analysis can't
disambiguate. Anchor and drift conventions live in
[`cases/README.md`](./cases/README.md#anchor-scope).
The structure remains automation-friendly for new tests:
1. **Stable test IDs.** `T01`-`T39` and `S01`-`S28` won't move. New tests append. Sequential, not semantic.
2. **Standardized test bodies.** Every functional test has `Severity`, `Steps`, `Expected`, `Diagnostics on failure`, and `References` sections. The Steps and Diagnostics fields are scripted-runner-shaped.
3. **Per-element UI checklists.** Each UI surface file lists interactive elements in a table — every row is a candidate `webContents.executeJavaScript` / `xprop` / DBus assertion.
4. **Severity-driven sweeps.** Tests with a `runner:` field execute via [`tools/test-harness/orchestrator/sweep.sh`](../../tools/test-harness/orchestrator/sweep.sh); JUnit XML lands in `results/results-${ROW}-${DATE}/junit.xml`. Tests without a `runner:` continue to run manually.
For tests that don't have a runner yet, status updates land in [`matrix.md`](./matrix.md) by hand after each manual sweep. For tests that do, the automation invocation is the source of truth — see [`runbook.md`](./runbook.md#automated-runs).
## Conventions
- **One PR per sweep result, not per cell change.** Bundle a full row update into a single commit titled `test: KDE-W sweep $(date +%F)`. Reduces matrix-merge noise.
- **Tested-version pin.** Every status update should mention the `claude-desktop` upstream version + the project version (`v1.3.x+claude...`) in the commit. Otherwise a `✓` from six months ago looks current.
- **Diagnostics on failure are mandatory.** Don't file `✗` without the captures listed in the test's `Diagnostics on failure` block. The runbook covers how to capture each.
- **Issue links go inline.** Status cells link directly to the relevant issue/PR.
See [`runbook.md`](./runbook.md) for the full mechanics.

440
docs/testing/automation.md Normal file
View File

@@ -0,0 +1,440 @@
# Automation Plan
*Last updated: 2026-04-30*
> **Status:** Direction agreed; first vertical slice scaffolded at
> [`tools/test-harness/`](../../tools/test-harness/) covering T01, T03, T04,
> T17 on KDE-W. The [Decisions](#decisions) table captures the calls
> already made; [Still open](#still-open) is the short list of things
> genuinely undecided. This file will fold into [`README.md`](./README.md)
> and [`runbook.md`](./runbook.md) once the harness has run a few real
> sweeps.
The [`README.md`](./README.md) automation roadmap is one paragraph. This file
is the longer version — what shape the harness takes, which tools fit which
tests, which anti-patterns to design against, and what to build first.
## Why this exists
The 67 tests in [`cases/`](./cases/) plus the 10 surfaces in [`ui/`](./ui/)
already have stable IDs, standardized bodies, and per-element checklists. That
structure is unusually friendly to automation — but only if the harness is
shaped to match the corpus, rather than the other way around. Three things
make that non-trivial:
1. The tests aren't homogeneous. Some are pure-renderer (Code tab), some are
native-OS-level (tray, autostart, URL handler), some are visual/UX checks
that probably stay manual forever.
2. The matrix is nine environments, four display servers, and two package
formats. Input injection on Wayland is genuinely different from X11, and
X11 is the project's default backend (Wayland-native is opt-in until
portal coverage matures across compositors).
3. Many failures are environment-specific by construction (mutter XWayland
key-grab, BindShortcuts on Niri, Omarchy Ozone-Wayland env exports). A
single "run everything everywhere" harness will mis-skip those.
## Decisions
| # | Decision | Rationale |
|---|----------|-----------|
| 1 | **Single language: TypeScript.** Every runner is `.ts`; OS tools are shelled out via `child_process` and wrapped as TS helpers. Python only as a last-resort escape hatch for AT-SPI cases that resist portal mocking. | Playwright Electron is JS-native (post-Spectron); `dbus-next` covers DBus end-to-end; portal mocking removes the dogtail dependency for most native-dialog tests. Three-language overhead doesn't pay back. |
| 2 | **Harness location: `tools/test-harness/`.** Sibling to `scripts/`. | Keeps `docs/testing/` documentation-only; matches the project's existing `tools/` / `scripts/` split. |
| 3 | **VM images: Packer for imperative distros + Nix flake for `Hypr-N`.** | Packer builds golden snapshots that boot fast and rebuild as code; Nix flake handles NixOS natively without a second wrapper. Vagrant's per-boot provisioning model is the wrong tradeoff for hermetic per-test snapshots. |
| 4 | **No CI infrastructure initially.** Harness is invokable from CI (orchestrator is a bash script with `ROW`, `ARTIFACT`, `OUTPUT_DIR` env vars), but sweeps run manually from the dev box for the first ~20 tests. CI wrapper comes after there's signal on which tests are stable enough to run unattended. | Avoids weeks of GHA / nested-KVM debugging for tests that aren't ready to be unattended. The bash orchestrator is the same code either way. |
| 5 | **Selectors: semantic locators only (`getByRole`, `getByLabel`, `getByText`).** No CSS classes against minified renderer output. No proactive `data-testid` injection patch. Escalate per-test only when a specific test proves unstable: first ask upstream for a stable `data-testid`; only carry an `app-asar.sh` patch if upstream declines. | Building selector-injection infrastructure up front is a guess at where rot will happen. Modern React apps usually have enough ARIA roles and visible text for `getByRole`/`getByText` to be durable. Measure before patching. |
| 6 | **X11-default verification is Smoke. Wayland-native characterization is Should.** Add a Smoke test asserting the launcher log shows X11/XWayland selected on each row (the project's release-gate behavior). Add per-row Should tests characterizing what happens if Electron's default Wayland selection is allowed — these are informational, not release-gating. | The project chose X11 default because portal `GlobalShortcuts` coverage is patchy. The new Wayland-default tests exist to map that landscape, not to gate releases on it. |
| 7 | **Diagnostic retention: last 10 greens + all reds, on `main` only.** Captures `--doctor`, launcher log, screenshot every run. Reds retained indefinitely; greens rotate. | Cheap regression-bisect baseline; bounded storage; reds are the things you actually need to look at six weeks later. |
| 8 | **JUnit XML lives as workflow-run artifacts.** Each sweep run uploads `results-${ROW}-${DATE}.tar.zst` containing JUnit + diagnostic bundle. Default 90-day retention, extend to 365 if needed. The matrix-regen step downloads the latest run's artifacts and updates `matrix.md` in a PR. | Zero new infrastructure; GH provides storage, lifecycle, auth. If cross-run analytics later require longer history, promote to a separate `claude-desktop-debian-test-history` repo *then* — not before there's signal on what to keep. |
## The three layers
Looking at the corpus, every test falls into one of three buckets, and each
bucket maps to a different shape of TS code (not a different language):
| Layer | What it covers | Implementation |
|-------|----------------|----------------|
| **L1 — Renderer** | Code tab, plugin install, settings, prompt area, slash menu, side chat, most of `ui/code-tab-panes.md`, `prompt-area.md`, `settings.md` | `playwright-electron` (`_electron.launch()`) directly |
| **L2 — Native / OS** | Tray (DBus), window decorations, URL handler (`xdg-open`), autostart, `--doctor`, multi-instance, hide-to-tray, native file picker (T17) | TS + `dbus-next` for DBus; `child_process` shell-outs wrapped as TS helpers (`xprop`, `wlr-randr`, `swaymsg`, `niri msg`, `pgrep`, `ydotool`); `dbus-next`-driven portal mocking for native-dialog tests |
| **L3 — Manual** | "Icon is crisp on HiDPI", drag-and-drop feel, T28 catch-up after suspend (real wall-clock), subjective UX checks | Human eyes; capture in [`runbook.md`](./runbook.md) sweep loop |
The `runner:` field [`README.md`](./README.md) hints at is the right unit.
One TS file per test under `tools/test-harness/runners/`, free to mix L1 and
L2 calls within a single test file. Tests without a `runner:` field stay
manual indefinitely — that's a feature, not a TODO.
## Architecture
```
host (orchestrator) per-row VM (or Nobara host for KDE-W)
───────────────────── ──────────────────────────────────────
tools/sweep.sh ssh → tools/test-harness/run.ts
├── L1 runners (playwright-electron)
├── L2 runners (dbus-next + shell-outs)
└── junit.xml + diagnostic bundle
tools/render-matrix.sh ← scp /tmp/results-${ROW}-${DATE}.tar.zst
matrix.md (regenerated)
```
The orchestrator is dumb: copy artifact in, kick the harness, copy results
out. Per-row variation lives in `tools/test-images/${ROW}/` (Packer recipe +
cloud-init / autoinstall, or a Nix flake for `Hypr-N`). The harness inside
each VM is the same checked-in TS code, branched on `XDG_CURRENT_DESKTOP` /
`XDG_SESSION_TYPE` for env-specific helpers.
Result format pivots on **JUnit XML** — well-trodden ground. Several actions
already exist that turn JUnit into Markdown summaries
([`junit-to-md`](https://github.com/davidahouse/junit-to-md), the
[Test Summary Action](https://github.com/marketplace/actions/junit-test-dashboard)).
The matrix-regen step is just "download artifact, merge per-row JUnit, render
cells, commit a PR."
### Why not drive Playwright over the wire?
The obvious sketch is "orchestrator on the host opens a CDP / DevTools port
on each VM and runs the whole suite from one place." It looks clean but has
real costs:
- CDP over network is fragile; port forwards are a constant footgun on
flaky links.
- Doesn't help with L2 at all — DBus calls, `xprop`, `pgrep`, file-system
probes still have to run in-VM.
- You'd end up maintaining two transports anyway, so the centralization
win evaporates.
In-VM Playwright via `_electron.launch()` is the [official Electron
recommendation](https://www.electronjs.org/docs/latest/tutorial/automated-testing)
since Spectron was archived in Feb 2022. No remote debug port needed; it
spawns Electron directly and gives you a context.
## Toolchain choices per layer
### L1 — `playwright-electron`
- Spawn via `_electron.launch({ args: ['main.js'] })` — no `--remote-debugging-port`.
- Gate `nodeIntegration: true` and `contextIsolation: false` behind
`process.env.CI === '1'` so tests get full main-process access without
weakening production security. (Electron docs explicitly recommend this
pattern.)
- **Locator policy: semantic only.** `getByRole`, `getByLabel`,
`getByText`, `getByPlaceholder`. No CSS selectors against minified class
names — they rot every upstream release. No `data-testid` infrastructure
built up front; if a specific test proves unstable, first ask upstream
for a stable `data-testid`, only carry an `app-asar.sh` patch as a last
resort.
- Use Playwright auto-wait. No fixed `sleep`s anywhere in the harness.
### L2 — `dbus-next` + wrapped shell-outs
The unifying observation: most of L2 is either DBus (which `dbus-next`
handles natively from TS) or short subprocess invocations of OS tools
(which `child_process.exec()` handles, wrapped as a typed TS helper). No
parallel bash test scripts; the test code reads as TS.
- **DBus everywhere it applies.**
[`dbus-next`](https://github.com/dbusjs/node-dbus-next) is actively
maintained, has TypeScript typings, and is designed for Linux desktop
integration. Replaces `gdbus call ...` invocations:
- Tray / SNI state queries (`org.kde.StatusNotifierWatcher`,
`org.freedesktop.DBus`).
- Portal availability checks (`org.freedesktop.portal.Desktop`).
- Suspend inhibitor inspection (`org.freedesktop.login1`).
- AT-SPI introspection where actually needed
(`org.a11y.atspi.*`).
- **Compositor / window-manager state via shell-out helpers.** No good
Node bindings exist for `xprop`, `wlr-randr`, `swaymsg`, `niri msg`
but invoking them from `child_process.exec()` inside a TS helper is
perfectly fine, and the test code stays unified:
```ts
// tools/test-harness/lib/wm.ts
export async function listToplevels(): Promise<Toplevel[]> { ... }
```
Each helper is a thin typed wrapper; the test reads as TS, not
bash-with-extra-steps.
- **Native dialogs (T17 folder picker, etc.) via portal mocking.** The
`org.freedesktop.portal.FileChooser` interface is just DBus. For tests
that exercise the *integration* (does Claude make the right portal call
and handle the result?) — which is what T17 actually tests — register
a mock backend over `dbus-next`, intercept the call, return a canned
path. No real dialog ever renders. This is both faster and a more
honest unit of test than driving a real chooser.
- **AT-SPI escape hatch.** For the rare test where portal mocking isn't
enough (driving an *actual* GTK/Qt dialog tree), the fallback is a
small Python [`dogtail`](https://pypi.org/project/dogtail/) script
invoked via `child_process.exec()` — same shape as the other shell-out
helpers, just Python on the other end. Today, T17 is the only test
that might need this; portal mocking probably covers it. We adopt
Python only when a specific test forces it, not speculatively.
### Input injection — `ydotool` now, `libei` next
- [`ydotool`](https://github.com/ReimuNotMoe/ydotool) goes through
`/dev/uinput`, so it works on both X11 and Wayland. Needs root or a
`uinput` group; not a problem inside a test VM. Invoked via the same
`child_process` shell-out pattern — `tools/test-harness/lib/input.ts`.
- Portal-grabbed shortcuts (T06, S11, S14) `ydotool` **cannot** trigger.
That's a kernel-vs-compositor boundary issue, not a tool gap. Those
tests stay manual until libei is widely available.
- The future-correct path is
[`libei`](https://www.phoronix.com/news/LIBEI-Emulated-Input-Wayland) +
the `RemoteDesktop` portal via `libportal`. KDE, GNOME, and wlroots
are all moving there. Worth a roadmap note that the shortcut tests
have a path to automation — just not today.
### VM lifecycle
- One image-build recipe per row in `tools/test-images/${ROW}/`. Packer
for the imperative distros (Fedora 43, Ubuntu 24.04, OmarchyOS, and
manual-install rows like i3 / Niri); Nix flake for `Hypr-N`.
- Rebuild nightly or per release-tag sweep — don't `apt update` /
`dnf update` inside a test run; mirrors hiccup, tests go red for the
wrong reason.
- Each test gets a hermetic `XDG_CONFIG_HOME` / `CLAUDE_CONFIG_DIR`
(S19 is already the test-isolation primitive). No shared state
between tests.
## The CDP auth gate (and the runtime-attach workaround that beats it)
*Discovered during the first KDE-W run-through; resolved by routing
through the in-app debugger menu's code path.*
The shipped `index.pre.js` contains an authenticated-CDP gate:
```js
uF(process.argv) && !qL() && process.exit(1);
```
`uF(argv)` matches **`--remote-debugging-port`** or
**`--remote-debugging-pipe`** on argv. `qL()` validates an ed25519-signed
token in `CLAUDE_CDP_AUTH` (signed payload
`${timestamp_ms}.${base64(userDataDir)}`, 5-minute TTL) against a hardcoded
public key. If the gate flag is on argv and a valid token isn't in env,
the app exits with code 1 right after `frame-fix-wrapper` completes. Both
Playwright's `_electron.launch()` and `chromium.connectOverCDP()` inject
`--remote-debugging-port=0` and trigger the gate. The signing key is held
upstream; we can't forge tokens.
**Crucially, the gate doesn't check `--inspect` or runtime SIGUSR1.** Those
trigger the **Node inspector**, not the Chrome remote-debugging port —
different surface. Notably, the in-app `Developer → Enable Main Process
Debugger` menu item *also* opens the Node inspector at runtime; that
menu's existence is the hint that this path is tolerated by upstream.
The harness uses this:
1. Spawn Electron with no debug-port flags. Gate stays asleep.
2. Wait for the X11 window to appear (signal that the app is up).
3. Send `SIGUSR1` to the main process pid. Same code path as the menu —
`inspector.open()` runs at runtime and the Node inspector starts on
port 9229.
4. Connect a WebSocket to `http://127.0.0.1:9229/json/list[0].
webSocketDebuggerUrl`.
5. Use `Runtime.evaluate` to run JS in the main process. From there:
- `webContents.getAllWebContents()` lists all live web contents
(including `https://claude.ai/...` once it loads into the
BrowserView).
- `webContents.executeJavaScript(...)` drives renderer-side DOM /
state queries.
- Main-process mocks (e.g. `dialog.showOpenDialog = ...` for T17) are
installed by direct assignment.
[`tools/test-harness/src/lib/inspector.ts`](../../tools/test-harness/src/lib/inspector.ts)
wraps this; [`tools/test-harness/src/lib/electron.ts`](../../tools/test-harness/src/lib/electron.ts)
exposes `app.attachInspector()` on the launched-app handle.
**Two implementation gotchas worth recording:**
- **`BrowserWindow.getAllWindows()` returns 0** because frame-fix-wrapper
substitutes the `BrowserWindow` class and the substitution breaks the
static registry. Use `webContents.getAllWebContents()` instead — that
registry stays intact and includes both the shell window and the
embedded claude.ai BrowserView.
- **`Runtime.evaluate` with `awaitPromise: true` + `returnByValue: true`
returns empty objects** for awaited Promise resolutions on this build's
V8. Workaround: have the IIFE return a `JSON.stringify(value)` and
`JSON.parse` on the caller side. `inspector.evalInMain<T>()` does this
internally so callers don't think about it.
**Status of the harness today:**
- **L2** — fully working (DBus, xprop). T03 / T04 pass.
- **L1 — T01** — passes via X11 window probe (no inspector needed).
- **L1 — T17 / similar** — framework works end-to-end (verified inspector
attach + dialog mock + webContents detection + Code-tab navigation
click). Selector tuning to match claude.ai's actual Code-tab UI is
ordinary iterate-as-needed work, not a blocker.
- **No `app-asar.sh` patch needed** to neutralize the gate. The
`dogtail`/AT-SPI escape hatch (Decision 1) is also no longer the
fallback for L1 — it's only relevant for native dialogs that the
inspector pattern can't reach.
## Notable shifts since the existing roadmap was written
These three changed the landscape in 2025 and the existing
[`README.md`](./README.md) Automation roadmap section predates them:
1. **Electron 38+ defaults to native Wayland.** [Electron 38 release
notes](https://www.electronjs.org/blog/electron-38-0) and the
[Wayland tech talk](https://www.electronjs.org/blog/tech-talk-wayland)
document this. Electron now has a Wayland CI job upstream. The project
keeps X11 as the default backend (Decision 6) because portal coverage
for `GlobalShortcuts` is uneven across compositors — the new tests
characterize what works where, not what to ship by default.
2. **Spectron is dead.** Archived Feb 2022; Playwright is the
[official recommendation](https://www.electronjs.org/blog/spectron-deprecation-notice).
No discussion needed about which framework — that's settled.
3. **`libei` is real and shipping.** KWin, mutter, and wlroots have all
moved. The shortcut-test gap (T06 / S11 / S14) is automatable in the
medium term, not "manual forever."
## Anti-patterns to design against
Pulled from the [Playwright flaky-test
checklist](https://testdino.com/blog/playwright-automation-checklist/),
the [Codepipes anti-patterns
catalogue](https://blog.codepipes.com/testing/software-testing-antipatterns.html),
and the [TestDevLab top 5
list](https://www.testdevlab.com/blog/5-test-automation-anti-patterns-and-how-to-avoid-them).
Designing the harness with these in mind from day one is much cheaper than
backing them out later:
| Anti-pattern | What it looks like | How to avoid in this project |
|---|---|---|
| Silent retry | Test passes on attempt 2; dashboard shows green; flake hidden | Log retry count to JUnit; `matrix.md` shows `✓*` for retried-pass; treat retried-pass as a Should-fix bug |
| Async-wait by `sleep` | `sleep 5` instead of `waitFor`; ICSE 2021 found ~45% of UI flakes here | No fixed sleeps in `tools/test-harness/`. Always poll a condition (window exists, log line, DBus name owned). Lint for `\bsleep\b` and `setTimeout` with literal numbers in test code |
| Mixing orchestration with verification | One test installs the package, launches, checks tray, asserts URL handler — five failure modes, one red cell | One test, one assertion class. Setup goes in shared fixtures, not test bodies |
| End-to-end as the only layer | All regressions caught at full-stack UI level | Keep `scripts/patches/*.sh` independently testable; add unit-level tests on patcher logic separately from the full-app sweep |
| Implementation-coupled selectors | `div.css-7xz92q` deep selectors against minified renderer classes | Decision 5: semantic locators only. If a selector proves unstable, first ask upstream for a stable `data-testid`; only carry an `app-asar.sh` patch as a last resort, per-test |
| Timing-sensitive assertions | "Within 500ms after click, X appears" | Time bounds are upper-bound sanity only. Use Playwright's auto-wait with a generous `timeout`; don't fight the framework |
| Hidden global state across tests | Test 4 fails because test 2 left `~/.config/Claude/SingletonLock` behind | Hermetic per-test `XDG_CONFIG_HOME` / `CLAUDE_CONFIG_DIR` (S19). Treat shared state as an isolation bug, not a known quirk |
| Long-lived VM state drift | Six-month-old snapshot has stale package mirrors; tests fail with 404s | Image rebuild as code (Packer / Nix flake); rebuild nightly or per release-tag. Never `apt update` mid-test |
| Treating skip as fail | wlroots-only test fails on KDE because it can't be skipped properly | `?` and `-` are first-class in [`matrix.md`](./matrix.md). Map JUnit `<skipped>` → `-`, `<error>` (harness broke) → `?`, only `<failure>` → `` |
| Diagnostics only on failure | Test goes red; capture fires; previous green run had no baseline to diff against | Decision 7: capture `--doctor`, launcher log, screenshot **on every run**. Last 10 greens + all reds on `main` |
| Network coupling | "Tray icon present" fails because Cloudflare hiccupped during sign-in | Tests that don't *need* network shouldn't touch it. Sign-in is one fixture; tray test runs on a pre-signed-in profile snapshot |
## What stays manual (for now)
These have no automation path that's worth the cost today, and that's
honest to call out in the roadmap rather than pretending they'll be
automated "soon":
- **T06 / S11 / S14** — global shortcut tests behind portal grabs. Path
exists (libei + RemoteDesktop portal) but compositor-side support is
patchy. Revisit when libei adoption broadens.
- **T15** — sign-in browser handoff. Needs a fixture account and an
upstream auth flow that won't necessarily welcome scripted login.
- **T28** — scheduled task catch-up after suspend. Real wall-clock event;
not worth simulating.
- **Anything in `ui/` tagged "looks right"** — HiDPI sharpness, theme
rendering, drag-feel. AT-SPI sees the tree, not the pixels.
T17 (folder picker) was previously in this list. Portal mocking via
`dbus-next` moves it into L2. If real-dialog testing turns out to be
necessary anyway, the dogtail escape hatch covers it.
The matrix already supports leaving these manual via the `?` / `-` /
existing-cell semantics — no schema change needed.
## Suggested first vertical slice
The smallest end-to-end that proves every architectural decision:
- **One row:** KDE-W (daily-driver host, no VM startup tax).
- **One test:** T01 — App launch.
- **Full pipeline:** orchestrator glue → harness entry → Playwright
`_electron.launch()` → JUnit XML → matrix-regen step → cell flips
from `?` to `` automatically.
That single slice forces every decision out into the open: harness
language (TS), JUnit emission, results-bundle layout, matrix-regen
rules, diagnostic-capture format. Resist building the orchestrator
before there's a passing test it can orchestrate. Once the slice is
real, adding tests 210 is mostly mechanical.
After T01: the next sensible additions are T03 (tray — exercises
`dbus-next` end-to-end), T04 (window decorations — exercises the
shell-out helper pattern), and T17 (folder picker — exercises portal
mocking). Those four runners cover every distinct shape of TS code in
the harness; everything else after them is a recombination.
## Still open
Most of the framing decisions are settled in the [Decisions](#decisions)
table. What remains:
1. **Owner assignments per row.** [`MEMORY.md`](https://github.com/aaddrick/claude-desktop-debian/blob/main/.claude/projects/-home-aaddrick-source-claude-desktop-debian/memory/MEMORY.md)
notes cowork → @RayCharlizard, nix → @typedrat. Hypr-N row is the
natural fit for @typedrat once the Nix flake exists. The other eight
rows: aaddrick by default, but worth asking the contributor base in a
discussion thread.
2. **AT-SPI escape-hatch trigger.** Decision 1 punts on Python until a
specific test forces it. T17 is the only candidate today, and portal
mocking probably covers it. If T17 actually needs real-dialog
automation, that's the first reopen.
3. **Selector rot rate.** Decision 5 starts with semantic locators and
measures. After ~20 tests on the renderer, revisit whether
`getByRole`/`getByText` is holding up or whether per-test
`data-testid` patches are warranted. No prediction; this is a
measure-and-decide.
4. **CI execution model.** Decision 4 punts on this entirely until the
harness has signal on which tests are stable. Reopen after the first
~20 tests have run from the dev box for a few weeks.
5. **Smoke-set Wayland-default test wording.** Decision 6 calls for a
Smoke test asserting X11/XWayland selection on each row, plus
per-row Should tests for Wayland characterization. The exact T-IDs
and case-file homes for those tests need to be drafted next time
`cases/` is touched.
## Sources
Background reading the recommendations draw on. Linked here so the
calls have receipts:
### Electron testing & Playwright
- [Electron — Automated Testing](https://www.electronjs.org/docs/latest/tutorial/automated-testing) — official tutorial, recommends Playwright
- [Electron — Spectron Deprecation Notice](https://www.electronjs.org/blog/spectron-deprecation-notice) — Feb 2022 archive
- [Playwright — Electron class](https://playwright.dev/docs/api/class-electron)
- [Playwright — ElectronApplication class](https://playwright.dev/docs/api/class-electronapplication)
- [Testing Electron apps with Playwright and GitHub Actions (Simon Willison)](https://til.simonwillison.net/electron/testing-electron-playwright)
- [`spaceagetv/electron-playwright-example`](https://github.com/spaceagetv/electron-playwright-example) — multi-window Playwright + Electron example
### DBus / TypeScript
- [`dbus-next` — actively-maintained Node DBus library with TS typings](https://github.com/dbusjs/node-dbus-next)
- [`dbus-next` on npm](https://www.npmjs.com/package/dbus-next)
### Wayland / X11 / input injection
- [Electron — Tech Talk: How Electron went Wayland-native](https://www.electronjs.org/blog/tech-talk-wayland)
- [Electron 38.0.0 release notes](https://www.electronjs.org/blog/electron-38-0)
- [PR #33355: fix calling X11 functions under Wayland](https://github.com/electron/electron/pull/33355)
- [LIBEI — Phoronix overview](https://www.phoronix.com/news/LIBEI-Emulated-Input-Wayland)
- [libei + RemoteDesktop portal — RustDesk discussion](https://github.com/rustdesk/rustdesk/discussions/4515)
- [`ydotool` README](https://github.com/ReimuNotMoe/ydotool)
- [`kwin-mcp` — KDE Plasma 6 Wayland automation tools](https://github.com/isac322/kwin-mcp)
### Portals / AT-SPI
- [XDG Desktop Portal — main repo](https://github.com/flatpak/xdg-desktop-portal)
- [`org.freedesktop.portal.FileChooser` interface XML](https://github.com/flatpak/xdg-desktop-portal/blob/main/data/org.freedesktop.portal.FileChooser.xml)
- [File Chooser portal documentation](https://flatpak.github.io/xdg-desktop-portal/docs/doc-org.freedesktop.portal.FileChooser.html)
- [`dogtail` on PyPI](https://pypi.org/project/dogtail/) — fallback only
- [Automation through Accessibility — Fedora Magazine](https://fedoramagazine.org/automation-through-accessibility/)
### Anti-patterns / flaky tests
- [Playwright automation checklist to reduce flaky tests (TestDino)](https://testdino.com/blog/playwright-automation-checklist/)
- [Flaky Tests: The Complete Guide to Detection & Prevention (TestDino)](https://testdino.com/blog/flaky-tests/)
- [5 Test Automation Anti-Patterns (TestDevLab)](https://www.testdevlab.com/blog/5-test-automation-anti-patterns-and-how-to-avoid-them)
- [Software Testing Anti-patterns (Codepipes)](https://blog.codepipes.com/testing/software-testing-antipatterns.html)
### JUnit XML reporting
- [`junit-to-md`](https://github.com/davidahouse/junit-to-md)
- [Test Summary GitHub Action](https://github.com/marketplace/actions/junit-test-dashboard)
- [Test Reporter](https://github.com/marketplace/actions/test-reporter)
### CI / VM matrix
- [Transient — QEMU CI wrapper](https://www.starlab.io/blog/simple-painless-application-testing-on-virtualized-hardwarenbsp)
- [`cirruslabs/tart` — VMs for CI automation](https://github.com/cirruslabs/tart)
---
*Once the first vertical slice (KDE-W + T01) ships, the relevant pieces of
this file fold into [`README.md`](./README.md) (Automation roadmap) and
[`runbook.md`](./runbook.md) (the harness invocation). Until then: working
notes that have crossed from brainstorm to plan.*

View File

@@ -0,0 +1,347 @@
# docs/testing/cases grounding sweep — implementation prompt
This file is meant to be **copied verbatim into a fresh Claude Code
session** as the initial user message. Don't paraphrase it; the
orchestration depends on the exact directives below.
---
## Prompt to paste
You're picking up after the v7 walker, U01 wire-up, and the
`claudeai.ts` AX-tree migration all landed. The page-objects are
stable against the live renderer (T17_folder_picker passes on
KDE-W). The next workstream is **grounding the case docs in
`docs/testing/cases/` against actual upstream behavior**.
The cases were written from outside-in — observed user-visible
flows, expected outcomes, diagnostic captures. Many describe
behavior the test author *believed* exists in upstream Claude
Desktop, but no one has cross-checked each Step / Expected against
the actual extracted source. Your job is to spawn one subagent per
case file, have each one read the case + grep the build-reference
extract for the relevant feature, and report what's accurate, what's
stale, and what's missing — then make in-place adjustments to the
case files so each one is grounded in concrete code anchors before
the next sweep cycle.
### Authoritative reference
Read these in order. They're the substrate the subagents will pull
from.
- `docs/testing/cases/README.md` — the case-doc structure (severity,
surface, applies-to, steps, expected, diagnostics, references).
The "Standard test body" template at the bottom is the contract
every case currently follows.
- `docs/testing/matrix.md` — live Pass/Fail/Pending matrix per row.
Tells you which cases have a runner and which are still
human-execution-only.
- `build-reference/app-extracted/.vite/build/` — the extracted +
beautified Claude Desktop source. ~14 files; `index.js` is the
main process (~546k lines after beautification), `mainView.js` /
`mainWindow.js` / `quickWindow.js` are renderer preloads,
`coworkArtifact.js` is the cowork BrowserView preload,
`buddy.js` is the supervisor, etc. **This is the ground truth.**
- `tools/test-harness/src/runners/` — existing runners that *do*
have working selectors / event hooks. Sometimes the runner has
more accurate code anchors than the case doc.
- `CLAUDE.md` (project root) — project conventions, attribution
format, commit style. Don't violate.
### Case files in scope
Eleven files plus the README. One subagent per file:
| File | Tests covered |
|---|---|
| `code-tab-foundations.md` | T15-T20 |
| `code-tab-handoff.md` | T23-T25, T34, T38, T39 |
| `code-tab-workflow.md` | T21-T22, T29-T32 |
| `distribution.md` | S01-S05, S15, S16, S26 |
| `extensibility.md` | T11, T33, T35-T37, S27, S28 |
| `launch.md` | T01, T02, T13, T14 |
| `platform-integration.md` | T09, T10, T12, S17, S18, S22-S25 |
| `routines.md` | T26-T28, S19-S21 |
| `shortcuts-and-input.md` | T05, T06, S06-S14, S29-S37 |
| `tray-and-window-chrome.md` | T03, T04, T07, T08, S08, S13 |
### Why this iteration
Several cases have been silently bit-rotting against upstream
changes — a Step says "click the X menu" but X was renamed two
upstream versions ago, or an Expected references a behavior the
team shipped behind a feature flag that's now off by default. When
the sweep runs against a row that's stale, the failure looks like a
Linux compatibility issue but is actually a doc-vs-upstream drift.
Grounding the cases against the actual extracted source closes
that gap and makes future sweeps interpretable.
This isn't a one-time correctness pass — it's a cycle. After every
upstream version bump (`CLAUDE_DESKTOP_VERSION` rolls in
`scripts/setup/detect-host.sh`), the grounding can drift again.
Optimise for **leaving concrete code-anchor breadcrumbs** in each
case so the next grounding pass is fast.
### Repo conventions
- Tabs for indentation in code; markdown is space-indented as the
existing files do it.
- Markdown lines wrap at ~80 chars unless they're tables or links
that don't break naturally.
- Don't commit. The user reviews and commits.
- Don't run the host Claude Desktop. The user runs it. Read from
`build-reference/` instead — that's already extracted +
beautified specifically so you don't have to attach to a live
app to verify behavior.
### Code anchors
- `build-reference/app-extracted/.vite/build/index.js` — main
process. Every IPC channel registration, window-management
decision, app-lifecycle hook, tray-menu construction, autostart
toggle, dialog invocation, and protocol handler lives here.
- `build-reference/app-extracted/.vite/build/quickWindow.js`
Quick Entry preload + window setup.
- `build-reference/app-extracted/.vite/build/mainWindow.js`
main shell BrowserWindow preload (claude.ai is loaded into a
child BrowserView; this preload runs in the shell frame).
- `build-reference/app-extracted/.vite/build/mainView.js`
preload running inside the claude.ai BrowserView itself.
- `build-reference/app-extracted/.vite/build/coworkArtifact.js`
preload running inside cowork's iframe-shaped artifact view.
- `build-reference/app-extracted/.vite/build/buddy.js` — supervisor
process (the daemon that respawns the cowork worker; see
`docs/learnings/cowork-vm-daemon.md`).
- `build-reference/app-extracted/package.json` — declared main /
preloads, electron version, native deps. Quick reference for
whether a feature is wired up at all.
### Phases
#### Phase 0 — calibration
1. `cd tools/test-harness && npm run typecheck` — should pass; if
not, stop and report.
2. Read `docs/testing/cases/README.md` end-to-end and one full case
file (suggest `launch.md` — small, four tests, easy
surface-area). Confirm you understand the case-doc contract
before fanning out.
3. Pick T01 (App launch) as a calibration case. Manually grep
`build-reference/app-extracted/.vite/build/index.js` for the
launcher-log / backend-selection logic referenced in T01's
Expected. Confirm you can read the beautified source and locate
the relevant code. Report the anchor (`index.js:N-M`) so the
user knows the workflow is sound before you fan out.
If Phase 0 surfaces a problem (build-reference stale relative to
the case doc, calibration anchor not findable, README structure
unclear), stop and report. Don't fan out subagents against an
unverified workflow.
#### Phase 1 — fan-out
Spawn one subagent per case file (eleven total). Use
`subagent_type: 'general-purpose'`. Send them in **parallel**
they're independent. Keep the prompt to each subagent
self-contained; the subagent has no context from this conversation.
Per-subagent prompt template (fill in the case file path):
```
You're grounding ONE test-case file in
docs/testing/cases/<FILE>.md against the extracted Claude Desktop
source at build-reference/app-extracted/.vite/build/.
Read these first:
- docs/testing/cases/README.md (case-doc contract)
- docs/testing/cases/<FILE>.md (your case file)
- CLAUDE.md (project conventions)
For each test in the file:
1. Read the test's Steps + Expected.
2. Identify the load-bearing claim — the upstream behavior the
test depends on (an IPC channel, a tray-menu item, a
dialog.showOpenDialog call, a globalShortcut.register, a
nativeTheme listener, etc.).
3. Grep build-reference/app-extracted/.vite/build/ for that claim.
Use ripgrep / grep -E. The code is beautified but minified
variable names — anchor on string literals, IPC channel names,
menu labels, event names, not variable identifiers.
4. Classify the result:
- **Grounded** — claim verified, anchor found. Append a
`**Code anchors:** <file>:<line>` line to the test body
directly under the existing References field.
- **Drifted** — feature exists but the case's Steps or Expected
don't match what's actually shipping. Edit the case to
match upstream behavior. Note what changed.
- **Missing** — feature isn't in the build at all (deprecated,
never shipped, behind unset flag). Mark the test with a
prepended block:
`> **⚠ Missing in build 1.5354.0** — <one-line note>. Re-verify after next upstream bump.`
- **Ambiguous** — claim could be one of several upstream code
paths and you can't disambiguate from the case alone. Don't
edit; report under "Open questions".
Per-test, prefer concrete code anchors over wordy explanations.
The next person reading this case should see exactly where
upstream implements the feature.
Constraints:
- Don't fabricate anchors. If you can't find it, mark Missing or
Ambiguous — never invent a `index.js:12345` reference.
- Don't restructure the case files. Keep the existing template
(Severity / Surface / Applies to / Issues / Steps / Expected /
Diagnostics / References). Only add code anchors and edit
Steps/Expected for drift.
- Don't expand scope. If you notice an unrelated bug or missing
test, note it under "Open questions" — don't fix it inline.
- Don't run the host Claude Desktop. Read from build-reference/
only.
Report shape (~300-500 words):
## <FILE>.md grounding
- Tests reviewed: N
- Grounded: N
- Drifted (edited): N (one-line per: <test-id> — <what changed>)
- Missing (marked): N (one-line per: <test-id> — <what's gone>)
- Ambiguous (flagged): N (one-line per: <test-id> — <why>)
### Code anchor highlights
- <test-id>: <file>:<line> — <what the anchor proves>
### Open questions
- ...
### Files touched
- docs/testing/cases/<FILE>.md
```
Keep the report tight. The orchestrator reads eleven of these and
synthesizes.
#### Phase 2 — synthesis
Once all eleven subagents return:
1. Aggregate per-classification counts across all files. Big
numbers in any column are signals:
- Lots of **Drifted** → upstream had a recent feature shuffle;
the team should know.
- Lots of **Missing** → either the case doc was written
speculatively or upstream removed features without telling.
- Lots of **Ambiguous** → the case-doc template needs a
"Implementation hint" field so future grounding has a
starting point.
2. Cross-check: did any subagent edit the same anchor differently?
(Unlikely since each owns one file, but worth a sanity pass.)
3. Check that `git diff docs/testing/cases/` matches what the
subagents reported. If a subagent claimed Drifted but didn't
write to disk, surface it.
4. Build the user-facing summary (see "Final report format" below).
Don't make the user re-read the eleven subagent reports — give
them the synthesised view + the per-file links.
### Self-correction loop
After Phase 1 returns:
1. If any subagent failed (no report, error, hit token limit),
re-spawn just that one with a tighter scope (e.g. "process
tests T15-T17 only, not the full file").
2. If a subagent's report claims edits but `git diff` shows no
changes, the subagent silently dropped the writes — re-spawn
with explicit instruction to use the Edit tool.
3. If two subagents flag the same upstream code path with
contradictory claims (one says Grounded, one says Missing),
re-read the source yourself and adjudicate.
Cap re-spawns at **2 per file** — past that, mark the file as
"needs human review" in the final report and move on.
### Termination conditions
Stop and write a final report when one of:
1. **All eleven files grounded.** Per-file classification counts +
diff stat. Done.
2. **Hit the re-spawn cap on 3+ files.** Stop, write up which
files are blocked, what each blocker looks like.
3. **Build-reference is stale.** If multiple subagents report
"Missing" against features the user knows shipped, the
extract may be out of date — verify the version
(`build-reference/app-extracted/package.json` `version` field
vs `CLAUDE_DESKTOP_VERSION` repo variable) before continuing.
### What you should NOT do
- Don't commit. The user reviews everything.
- Don't restructure the case-doc template. Eleven files, one
shape — keep it that way.
- Don't add new tests. Grounding is a verify-and-anchor pass, not
a coverage expansion.
- Don't run the host Claude Desktop. The build-reference extract
exists specifically so you don't have to attach to a live app.
- Don't edit anything outside `docs/testing/cases/`. If you find
a runner discrepancy (case says "click X", runner clicks "Y"),
flag it under Open questions; don't edit the runner.
- Don't invent anchors. If the grep doesn't find the literal,
classify Missing or Ambiguous — never write a fictional
`index.js:12345` reference.
### Final report format
```markdown
## Cases grounding summary
- Files reviewed: 11 / 11
- Tests reviewed: N (sum across all files)
- Grounded: N (with code anchors added)
- Drifted (edited): N
- Missing (marked): N
- Ambiguous: N
- Files needing
human review: N
## Per-file breakdown
| File | Reviewed | Grounded | Drifted | Missing | Ambiguous |
|---|---|---|---|---|---|
| code-tab-foundations.md | ... | ... | ... | ... | ... |
| ... | | | | | |
## Notable findings
- <test-id>: <one-line significance>
- ...
## Open questions
- ...
## Files touched
git status output (only docs/testing/cases/*.md should appear)
## Diff summary
git diff --stat docs/testing/cases/
```
### Operational notes
- Subagents are launched in parallel via a single message with
multiple Agent tool calls. Don't serialize them — Phase 1 takes
~15 minutes serial, ~3 minutes parallel.
- Each subagent's Edit calls land directly in the working tree.
No merge conflicts because each owns one file.
- The build-reference `index.js` is 546k lines. Subagents should
use `grep -nE` with anchored string literals, not full reads.
Recommended grep pattern style:
`grep -nE 'globalShortcut\.register\([^)]*' build-reference/app-extracted/.vite/build/index.js`
- If a subagent needs to verify a renderer-side claim (DOM event
flow, React component shape), the relevant preload is in
`mainView.js` / `mainWindow.js`. Don't grep `index.js` for
renderer-only behavior.
Begin with Phase 0. Don't fan out until calibration succeeds.

View File

@@ -0,0 +1,94 @@
# Functional Test Cases
Test specifications grouped by feature surface. For live status, see [`../matrix.md`](../matrix.md). For sweep workflow, see [`../runbook.md`](../runbook.md). For the UI element inventory, see [`../ui/`](../ui/).
## Files
| File | Surfaces covered | Tests |
|------|------------------|-------|
| [`launch.md`](./launch.md) | App startup, doctor, package detection, multi-instance | T01, T02, T13, T14 |
| [`tray-and-window-chrome.md`](./tray-and-window-chrome.md) | Tray icon, window decorations, hybrid topbar, hide-to-tray | T03, T04, T07, T08, S08, S13 |
| [`shortcuts-and-input.md`](./shortcuts-and-input.md) | URL handler, Quick Entry, global shortcuts | T05, T06, S06, S07, S09, S10, S11, S12, S14, S29, S30, S31, S32, S33, S34, S35, S36, S37 |
| [`code-tab-foundations.md`](./code-tab-foundations.md) | Sign-in, Code tab load, folder picker, drag-drop, terminal, file pane | T15, T16, T17, T18, T19, T20 |
| [`code-tab-workflow.md`](./code-tab-workflow.md) | Preview, PR monitor, worktrees, auto-archive, side chat, slash menu | T21, T22, T29, T30, T31, T32 |
| [`code-tab-handoff.md`](./code-tab-handoff.md) | Notifications, external editor, file manager, connector OAuth, IDE handoff | T23, T24, T25, T34, T38, T39 |
| [`routines.md`](./routines.md) | Scheduled tasks, catch-up runs, suspend inhibit, config dir | T26, T27, T28, S19, S20, S21 |
| [`extensibility.md`](./extensibility.md) | Plugins, MCP, hooks, CLAUDE.md memory, worktree storage | T11, T33, T35, T36, T37, S27, S28 |
| [`distribution.md`](./distribution.md) | DEB, RPM, AppImage, dependency pulls, auto-update | S01, S02, S03, S04, S05, S15, S16, S26 |
| [`platform-integration.md`](./platform-integration.md) | Autostart, Cowork, WebGL, PATH inheritance, Computer Use, Dispatch | T09, T10, T12, S17, S18, S22, S23, S24, S25 |
## Standard test body
Every test in this directory follows this structure:
```markdown
### T## — Title
**Severity:** Smoke | Critical | Should | Could
**Surface:** human-readable surface tag (e.g. "Code tab → Environment")
**Applies to:** All | <subset of rows>
**Issues:** linked issue/PR list, or `—`
**Steps:**
1. ...
2. ...
**Expected:** what should happen.
**Diagnostics on failure:** which captures to attach when filing. See [`../runbook.md#diagnostic-capture`](../runbook.md#diagnostic-capture).
**References:** docs links, learnings, related issues.
**Code anchors:** `<file>:<line>` pointers to the upstream code or
wrapper script that backs the load-bearing claim above. Added during
the grounding sweep — see "Anchor scope" for guidance on where
anchors can and can't land.
**Inventory anchor:** (optional) `<element-id>` from
[`../ui-inventory.json`](../ui-inventory.json) — only if the surface
shows up in the v7 walker's idle capture. For surfaces inside modals
or popups, append a sentence noting which click-chain opens them so
the next inventory regeneration can grab them.
```
The Steps and Diagnostics fields are written so they can later become
script entry points without a rewrite.
### Anchor scope
Where the load-bearing claim lives determines where the anchor goes:
- **Upstream code** — any file under
`build-reference/app-extracted/.vite/build/` (most often `index.js`,
the main process). Use `index.js:N` style anchors.
- **Our wrapper code** — `scripts/launcher-common.sh`, `scripts/doctor.sh`,
`scripts/patches/*.sh`, `scripts/frame-fix-wrapper.js`,
`scripts/wco-shim.js`. Use `<repo-relative-path>:N` style anchors.
- **Server-rendered (claude.ai SPA)** — anchorable only via the v7
walker inventory (`docs/testing/ui-inventory.json`) or a runtime
capture from `tools/test-harness/grounding-probe.ts`. Idle-state
inventory misses contextual surfaces (modals, popups, slash menus,
context menus, side panels) — note that explicitly.
- **Upstream `claude` CLI binary** — out of scope for this matrix
(e.g. T39 `/desktop` is a CLI slash-command, not in the Electron
asar). Mark as Ambiguous and link to a separate CLI matrix if one
exists.
If a claim spans multiple scopes (a wrapper script triggering
upstream behavior, e.g. T01's launcher-log + main-window-opens),
list all the anchors. The whole point is making the next sweep
faster — over-anchoring is fine, missing anchors is not.
### Drift markers
When a sweep finds upstream behavior no longer matches the case:
- **Edited Steps/Expected** — fix the case in place, mention what
changed in the commit message. The case is the spec.
- **Missing in build X.Y.Z** — prepend a blockquote under the test
heading: `> **⚠ Missing in build 1.5354.0** — <one-line note>.
Re-verify after next upstream bump.` Use when the feature isn't
in the build at all (deprecated, behind unset flag, never shipped).
- **Ambiguous** — don't edit; flag in the sweep report. Use when
the load-bearing claim could be one of several candidate code
paths and static analysis can't disambiguate.

View File

@@ -0,0 +1,197 @@
# Code Tab — Foundations
Tests covering Code-tab availability on Linux (officially unsupported per upstream docs), sign-in flow, folder picker, drag-and-drop, and the basic editing surfaces (terminal, file pane). See [`../matrix.md`](../matrix.md) for status.
## T15 — Sign-in completes in the embedded webview
> **Drift in build 1.5354.0** — Sign-in is an in-app `mainView.webContents.loadURL` flow, not an `xdg-open` browser handoff. Claude.ai/login renders inside the embedded BrowserView; the resulting `sessionKey` cookie is then exchanged at `${apiHost}/v1/oauth/${org}/authorize` with redirect URI `https://claude.ai/desktop/callback`. No system browser is involved.
**Severity:** Smoke
**Surface:** Auth / embedded webview
**Applies to:** All rows
**Issues:**
**Steps:**
1. Launch a fresh app instance (signed-out state).
2. Click **Sign in**. Observe claude.ai/login rendering inside the app.
3. Authenticate. Observe the in-app navigation completing back to the
workspace.
**Expected:** Sign-in stays inside the embedded webview (`will-navigate`
handler `Ihr` keeps `/login/` paths in-app). After auth the
`sessionKey` cookie is captured and silently exchanged for an OAuth
token via the `desktop/callback` redirect. Account dropdown populates;
no auth banner remains.
**Diagnostics on failure:** DevTools console for the `mainView`
BrowserView, network captures of the `/v1/oauth/{org}/authorize` and
`/v1/oauth/token` calls, launcher log, cookie jar inspection
(`sessionKey` on `.claude.ai`).
**References:** [Code tab auth troubleshooting](https://code.claude.com/docs/en/desktop#403-or-authentication-errors-in-the-code-tab)
**Code anchors:**
- `build-reference/app-extracted/.vite/build/index.js:141996` — desktop
OAuth redirect URI `https://claude.ai/desktop/callback`
- `build-reference/app-extracted/.vite/build/index.js:142431` — POST to
`${apiHost}/v1/oauth/${org}/authorize` with `Bearer ${sessionKey}`
- `build-reference/app-extracted/.vite/build/index.js:216565``Ihr`
treats `/login/` paths as in-app (not external)
- `build-reference/app-extracted/.vite/build/index.js:141316`
`mainView.webContents.loadURL(...)` drives the embedded sign-in
## T16 — Code tab loads
**Severity:** Smoke
**Surface:** Code tab — top-level UI
**Applies to:** All rows
**Issues:**
**Steps:**
1. After sign-in, click the **Code** tab at the top center.
2. Wait a few seconds.
**Expected:** Code tab renders the session UI (sidebar, prompt area, environment dropdown). Per upstream docs the Code tab is "not supported" on Linux — the patched build under this project should render the UI normally or surface a clear, actionable message. Not a blank screen, infinite spinner, or `Error 403: Forbidden`.
**Diagnostics on failure:** Screenshot, DevTools console, network captures (auth/feature-flag responses), launcher log, the active patch set in `scripts/patches/`.
**References:** [Use Claude Code Desktop](https://code.claude.com/docs/en/desktop), [Get started with the desktop app](https://code.claude.com/docs/en/desktop-quickstart)
**Code anchors:**
- `build-reference/app-extracted/.vite/build/index.js:525066`
`sidebarMode === "code"` rewrites the BrowserView path to `/epitaxy`
- `build-reference/app-extracted/.vite/build/index.js:496066` — Code
deeplinks (`claude://code?...`) navigate to `/epitaxy?...`
- `build-reference/app-extracted/.vite/build/index.js:105273``IHi`
recognises `/epitaxy` and `/epitaxy/...` as the Code-tab path
- `build-reference/app-extracted/.vite/build/index.js:105346`
`sidebarMode` enum contains `"code"`
**Inventory anchor:** `…tablist.tab-by-name.code` (role `tab`, label
`Code`) — confirms the Code tab is reachable from the new-chat tablist
in the captured idle state.
## T17 — Folder picker opens
**Severity:** Smoke
**Surface:** Code tab → Environment selection
**Applies to:** All rows
**Issues:**
**Runner:** [`tools/test-harness/src/runners/T17_folder_picker.spec.ts`](../../../tools/test-harness/src/runners/T17_folder_picker.spec.ts) — runtime-attach via SIGUSR1 + main-process `dialog.showOpenDialog` mock + `webContents.executeJavaScript` to drive the renderer. Click chain to reach the folder-picker button awaits selector tuning
**Steps:**
1. In the Code tab, click the environment pill → **Local****Select folder**.
2. Choose a project directory.
**Expected:** Native file chooser opens. On Wayland sessions the chooser is `xdg-desktop-portal`-backed (verify with `busctl --user tree org.freedesktop.portal.Desktop`). On X11 sessions the GTK/Qt native picker fires. Selected path appears in the env pill.
**Diagnostics on failure:** `systemctl --user status xdg-desktop-portal`, `XDG_SESSION_TYPE`, the portal backend in use (`xdg-desktop-portal-kde`, `xdg-desktop-portal-gnome`, `xdg-desktop-portal-wlr`), launcher log.
**References:** [Local sessions](https://code.claude.com/docs/en/desktop#local-sessions)
**Code anchors:**
- `build-reference/app-extracted/.vite/build/index.js:66403` — IPC
channel `claude.web_FileSystem_browseFolder` (renderer → main)
- `build-reference/app-extracted/.vite/build/index.js:509188`
`browseFolder` impl calls `dialog.showOpenDialog` with
`properties: ["openDirectory", "createDirectory"]`
- `build-reference/app-extracted/.vite/build/index.js:450534`
`grantViaPicker` (Operon host-access folder grant) uses the same
`["openDirectory"]` shape
- `tools/test-harness/src/lib/claudeai.ts:122``installOpenDialogMock`
intercepts both `(opts)` and `(window, opts)` arities, matching the
call sites at index.js:509196 and :450534
**Inventory anchor:** `root.main.region.button-by-name.select-folder`
(role `button`, label `Select folder…`) — the persistent button the
T17 runner clicks before the dialog mock fires.
## T18 — Drag-and-drop files into prompt
**Severity:** Critical
**Surface:** Code tab → Prompt area
**Applies to:** All rows
**Issues:**
**Steps:**
1. Open a Code-tab session.
2. From the system file manager, drag one or more files into the prompt area.
3. Repeat with multiple files at once.
**Expected:** Files attach to the prompt. The renderer resolves dropped
`File` objects to absolute paths via the preload-bridged
`claudeAppSettings.filePickers.getPathForFile` (Electron's
`webUtils.getPathForFile`). Multi-file drops attach each file. Works on
both Wayland and X11.
**Diagnostics on failure:** Screen recording, `wl-paste --list-types` (Wayland) or `xclip -selection clipboard -t TARGETS -o` (X11) during drag, DevTools console, launcher log.
**References:** [Add files and context](https://code.claude.com/docs/en/desktop#add-files-and-context-to-prompts)
**Code anchors:**
- `build-reference/app-extracted/.vite/build/mainView.js:9267`
`filePickers.getPathForFile` wraps `webUtils.getPathForFile`
- `build-reference/app-extracted/.vite/build/mainView.js:9552`
exposed to the renderer as `window.claudeAppSettings`
## T19 — Integrated terminal
**Severity:** Critical
**Surface:** Code tab → Terminal pane
**Applies to:** All rows
**Issues:**
**Steps:**
1. In a Code-tab session, press `` Ctrl+` `` (or open via the Views menu).
2. Confirm the terminal opens in the session's working directory.
3. Run `git status`, `npm --version`, `gh auth status`.
**Expected:** Terminal pane opens in the session's working directory, inherits the same `PATH` Claude sees. Standard commands run cleanly. Terminal pane is local-session-only per docs.
**Diagnostics on failure:** Terminal pane content, `echo $PATH` from inside the pane, `pwd`, the shell binary in use, launcher log.
**References:** [Run commands in the terminal](https://code.claude.com/docs/en/desktop#run-commands-in-the-terminal)
**Code anchors:**
- `build-reference/app-extracted/.vite/build/index.js:69135` — IPC
channel `claude.web_LocalSessions_startShellPty` (also
`resizeShellPty`, `writeShellPty` at :69184, :69210)
- `build-reference/app-extracted/.vite/build/index.js:486438`
`startShellPty` body: spawns `node-pty` in
`n.worktreePath ?? n.cwd` with `TERM=xterm-256color`
- `build-reference/app-extracted/.vite/build/index.js:486463`
`node-pty` dynamic import (optional dep, `package.json` line 100)
- `build-reference/app-extracted/.vite/build/index.js:259306`
`shell-path-worker/shellPathWorker.js` resolves the user's interactive
PATH; `FX()` (line 259311) returns it for the spawned PTY env
## T20 — File pane opens and saves
**Severity:** Critical
**Surface:** Code tab → File pane
**Applies to:** All rows
**Issues:**
**Steps:**
1. In a Code-tab session, click a file path in chat or diff to open it in the file pane.
2. Make a small edit. Click **Save**.
3. Modify the file externally (e.g. `echo >> file`). Re-edit in the pane. Observe the on-disk-changed warning.
**Expected:** File opens in the editor pane. Edits write back to disk on Save. If the file changed on disk since opening, the pane shows the on-disk-changed warning and offers override or discard. (The conflict check is sha256-based, not mtime-based — `writeSessionFile` reads the current bytes, hashes them, and rejects with `Conflict` if the renderer-supplied `expectedHash` doesn't match.)
**Diagnostics on failure:** `sha256sum <file>` output (and stat mtime for cross-checking), launcher log, DevTools console, screen recording of the warning state.
**References:** [Open and edit files](https://code.claude.com/docs/en/desktop#open-and-edit-files)
**Code anchors:**
- `build-reference/app-extracted/.vite/build/index.js:68922` — IPC
channel `claude.web_LocalSessions_readSessionFile`
- `build-reference/app-extracted/.vite/build/index.js:69003` — IPC
channel `claude.web_LocalSessions_writeSessionFile` with
`expectedHash` argument at position 3
- `build-reference/app-extracted/.vite/build/index.js:492874`
`readSessionFile` impl
- `build-reference/app-extracted/.vite/build/index.js:492954`
`writeSessionFile` impl: sha256-hashes current on-disk bytes,
returns `{ status: nW.Conflict, currentHash }` when `expectedHash`
mismatches

View File

@@ -0,0 +1,163 @@
# Code Tab — Handoffs to Other Apps
Tests covering desktop notifications, "Open in" external editor, "Show in Files" file manager, connector OAuth round-trips, IDE handoff, and graceful failure of the macOS/Windows-only `/desktop` CLI command. See [`../matrix.md`](../matrix.md) for status.
## T23 — Desktop notifications fire
**Severity:** Critical
**Surface:** Notifications (libnotify / XDG Notifications)
**Applies to:** All rows
**Issues:**
**Steps:**
1. Trigger each notification source: scheduled-task fire ([T27](./routines.md#t27--scheduled-task-fires-and-notifies)), CI completion ([T22](./code-tab-workflow.md#t22--pr-monitoring-via-gh)), Dispatch handoff ([S24](./platform-integration.md#s24--dispatch-spawned-code-session-appears-with-badge-and-notification)).
2. Observe each notification appears.
3. Click each — confirm it focuses the relevant session.
**Expected:** Notifications appear in the active DE's notification area (Plasma's notification daemon, Mako on wlroots, gnome-shell, etc.) and are clickable to focus the relevant session.
**Diagnostics on failure:** `gdbus call --session --dest=org.freedesktop.Notifications --object-path=/org/freedesktop/Notifications --method=org.freedesktop.DBus.Introspectable.Introspect`, `notify-send "test"` (sanity check daemon), launcher log, DE-specific notification logs.
**References:** [Scheduled tasks](https://code.claude.com/docs/en/desktop-scheduled-tasks), [Monitor pull request status](https://code.claude.com/docs/en/desktop#monitor-pull-request-status)
**Code anchors:** `build-reference/app-extracted/.vite/build/index.js:494456` (`new hA.Notification(r)` — backed by Electron's libnotify on Linux); `:495110` (`showNotification(title, body, tag, navigateTo)` dispatches Swift on macOS, Electron elsewhere); `:511174`, `:512738` (cu-lock / tool-permission notifications wire a click callback that navigates to `/local_sessions/{sessionId}` to focus the session).
## T24 — Open in external editor
**Severity:** Should
**Surface:** Code tab → Right-click → Open in
**Applies to:** All rows
**Issues:**
**Steps:**
1. Install at least one of: VS Code, Cursor, Zed, Windsurf (any install method —
flatpak, AppImage, distro package). Xcode is darwin-only and absent on Linux.
2. In the Code tab, right-click a file path → **Open in** → choose the editor.
3. Confirm the editor opens at that file.
**Expected:** Right-click → **Open in** launches the chosen editor with the file
path. Editor is invoked by URL scheme (`vscode://file/<path>`,
`cursor://file/<path>`, `zed://file/<path>`, `windsurf://file/<path>`) via
`shell.openExternal`, which delegates to `xdg-open`'s
`x-scheme-handler/<editor>` resolution rather than hard-coded paths.
**Diagnostics on failure:** `xdg-mime query default x-scheme-handler/vscode` (or
`cursor`/`zed`/`windsurf`), `desktop-file-validate` on the editor's `.desktop`
file, `xdg-open vscode://file/<path>` from terminal (sanity check), launcher
log.
**References:** [Open files in other apps](https://code.claude.com/docs/en/desktop#open-files-in-other-apps)
**Code anchors:** `build-reference/app-extracted/.vite/build/index.js:59076`
(editor enum: VSCode, Cursor, Zed, Windsurf, Xcode); `:463902` (`Mtt`
registry — `vscode://`, `cursor://`, `zed://`, `windsurf://`, `xcode://` with
darwin-only flag on Xcode); `:463956` (`getInstalledEditors` probes via
`app.getApplicationInfoForProtocol`); `:464011`
(`shell.openExternal('<scheme>://file/<encoded-path>:<line>')` — path is
URL-encoded but `/` separators are preserved); `:68816` IPC handler
`LocalSessions.openInEditor(path, editor, sshConfig, line)`.
## T25 — Show in Files / file manager
**Severity:** Should
**Surface:** Code tab → Right-click → Show in Files
**Applies to:** All rows
**Issues:**
**Steps:**
1. In the Code tab, right-click a file path → "Show in Files" (Linux equivalent of macOS "Show in Finder" / Windows "Show in Explorer").
2. Confirm the system file manager opens with the containing folder selected.
**Expected:** System file manager (Nautilus on GNOME, Dolphin on KDE, Thunar on Xfce, etc.) opens with the file pre-selected. Resolution respects `xdg-mime` defaults.
**Diagnostics on failure:** `xdg-mime query default inode/directory`, `xdg-open <dir>` from terminal, the menu label rendered (was it Linux-specific or stuck on "Show in Finder"?), launcher log.
**References:** [Open files in other apps](https://code.claude.com/docs/en/desktop#open-files-in-other-apps)
**Code anchors:** `build-reference/app-extracted/.vite/build/index.js:66652` IPC
handler `FileSystem.showInFolder(path)`; `:509431` impl thin-wraps
`hA.shell.showItemInFolder(Tc(path))`. Electron's `showItemInFolder` on Linux
falls back to `xdg-open` on the parent directory when no DBus FileManager1
service is present, so the file is rarely pre-selected on minimal DEs — only
the parent folder opens.
## T34 — Connector OAuth round-trip
**Severity:** Critical
**Surface:** Connectors → OAuth handoff
**Applies to:** All rows
**Issues:**
**Steps:**
1. In a Code-tab session, click **+** → **Connectors** → choose a service (Slack, GitHub, Linear, Notion, Google Calendar).
2. Step through the OAuth flow in the system browser.
3. Return to Claude Desktop and verify the connector appears in **Settings → Connectors**.
4. Use the connector in a prompt (e.g. "list my Slack channels").
**Expected:** Adding a connector launches the browser via `xdg-open`, OAuth callback hands control back to Claude Desktop, connector appears in Settings, and is usable in subsequent prompts.
**Diagnostics on failure:** `xdg-mime query default x-scheme-handler/https`, the callback URL scheme, network captures of OAuth redirect, launcher log, DevTools console.
**References:** [Connect external tools](https://code.claude.com/docs/en/desktop#connect-external-tools), [Connectors for everyday life](https://claude.com/blog/connectors-for-everyday-life)
**Code anchors:**
`build-reference/app-extracted/.vite/build/index.js:524819`
(`hA.app.setAsDefaultProtocolClient("claude")` — registers the `claude://`
deep-link scheme used by the OAuth callback); `:525026` mainWindow
`setWindowOpenHandler` routes external URLs through `MAA(url)`
`:525102``:525135` (only `http:`/`https:`/`mailto:`/`tel:`/`sms:`/
`ms-(excel|powerpoint|word):` are forwarded to system handlers; everything
else is dropped); `:136233` `$a(url)` thin-wraps `hA.shell.openExternal(url)`
(this is the single egress point for browser handoff); `:159634`
`mcpSubmitOAuthCallbackUrl(serverName, callbackUrl)` and `:159651`
`claudeOAuthCallback(authorizationCode, state)` — IPC bridges that consume
the deep-link callback. See [`docs/learnings/plugin-install.md`](../../learnings/plugin-install.md)
for orgId/sessionKey cookie chain that gates connector listing.
## T38 — Continue in IDE
**Severity:** Should
**Surface:** Code tab → Continue in menu
**Applies to:** All rows
**Issues:**
**Steps:**
1. In a Code-tab session, click the IDE icon (bottom right of session toolbar) → **Continue in** → choose an IDE.
2. Confirm the IDE opens at the working directory.
**Expected:** Selected IDE opens the project at the current working directory. Resolution via `xdg-open` / `.desktop` files.
**Diagnostics on failure:** `xdg-open <project-dir>` sanity check, `xdg-mime query default x-scheme-handler/vscode` (or matching scheme for the chosen IDE), launcher log, the IDE's `.desktop` file.
**References:** [Continue in another surface](https://code.claude.com/docs/en/desktop#continue-in-another-surface)
**Code anchors:** Same IPC surface as [T24](#t24--open-in-external-editor) —
`build-reference/app-extracted/.vite/build/index.js:68816`
(`LocalSessions.openInEditor(path, editor, sshConfig, line)` accepts a
directory path the same way as a file path); `:463902` editor registry;
`:464011` `shell.openExternal('<scheme>://file/<cwd>')`. The "Continue in"
chooser UI is rendered server-side by claude.ai and not present in the local
asar — only the IPC bridge can be code-anchored.
## T39 — `/desktop` CLI handoff (graceful N/A)
> **Note** — This test exercises the upstream `claude` CLI binary, not the
> Electron app. The CLI ships separately from this packaging (out of
> `build-reference/`), so no anchor in `app-extracted/.vite/build/` exists for
> the slash-command handler. Re-verify behaviour against the CLI binary that
> ships with the upstream version under test (currently 1.5354.0).
**Severity:** Could
**Surface:** CLI `/desktop` command
**Applies to:** All rows (Linux equally)
**Issues:**
**Steps:**
1. In a CLI session, run `/desktop`.
2. Inspect exit code and output.
**Expected:** `/desktop` is documented as macOS/Windows-only. On Linux it must fail gracefully — print a clear "not supported on Linux" message and exit cleanly. No partial state transition, no panic, no corrupted session file.
**Diagnostics on failure:** Full CLI output, exit code, the session file before/after (`~/.claude/sessions/...`), strace if the CLI hangs.
**References:** [Coming from the CLI](https://code.claude.com/docs/en/desktop#coming-from-the-cli)

View File

@@ -0,0 +1,151 @@
# Code Tab — Workflow Surfaces
Tests covering the dev-server preview pane, PR monitoring, worktree isolation, auto-archive, side chat, and the slash command menu. See [`../matrix.md`](../matrix.md) for status.
## T21 — Dev server preview pane
**Severity:** Should
**Surface:** Code tab → Preview pane
**Applies to:** All rows
**Issues:**
**Steps:**
1. In a Code-tab session, ensure `.claude/launch.json` is configured (or let auto-detect populate it).
2. Click **Preview** dropdown → **Start**.
3. Interact with the embedded browser. Verify auto-verify takes screenshots.
4. Stop the server from the dropdown.
**Expected:** Configured dev server starts. Embedded browser renders the running app. Auto-verify takes screenshots and inspects DOM. Stopping from the dropdown actually stops the process.
**Diagnostics on failure:** `lsof -i :<port>` to see the server, screenshot of preview pane state, `.claude/launch.json` content, launcher log, DevTools console.
**References:** [Preview your app](https://code.claude.com/docs/en/desktop#preview-your-app)
**Code anchors:**
- `build-reference/app-extracted/.vite/build/index.js:262175``Pae = "Claude Preview"` + `preview_*` MCP tool table (`preview_start`, `preview_stop`, `preview_list`, `preview_screenshot`, `preview_snapshot`, `preview_inspect`, `preview_click`, `preview_fill`, `preview_eval`, `preview_network`, `preview_resize`).
- `build-reference/app-extracted/.vite/build/index.js:259604``setAutoVerify()` and `parseLaunchJson()` (reads `.claude/launch.json`, honours `autoVerify` flag default-on).
- `build-reference/app-extracted/.vite/build/index.js:260015``capturePage()` / `captureViaCDP()` drive `preview_screenshot` against the embedded preview WebContents.
## T22 — PR monitoring via `gh`
**Severity:** Critical
**Surface:** Code tab → CI status bar
**Applies to:** All rows
**Issues:**
**Steps:**
1. Ensure `gh` is installed and authenticated (`gh auth status`).
2. In a Code-tab session, ask Claude to open a PR for a small change.
3. Observe the CI status bar. Toggle **Auto-fix** and **Auto-merge**.
4. Run a separate test on a row where `gh` is **not** installed — confirm the missing-`gh` prompt appears the first time a PR action is taken.
**Expected:** With `gh` present and authenticated, CI status bar surfaces in the session toolbar. Auto-fix and Auto-merge toggles work (auto-merge requires the corresponding GitHub repo setting). If `gh` is missing, the app surfaces a prompt directing the user to https://cli.github.com (auto-install via `installGh` only runs on macOS/brew; Linux returns an error string with the install URL).
**Diagnostics on failure:** `gh auth status`, `which gh`, launcher log, DevTools console, screenshot of status bar, the GitHub repo's "Allow auto-merge" setting.
**References:** [Monitor pull request status](https://code.claude.com/docs/en/desktop#monitor-pull-request-status)
**Code anchors:**
- `build-reference/app-extracted/.vite/build/index.js:464281``GitHubPrManager` (`prStateCache`, `prChecksCache`); `getPrChecks` at line 464964 fans out to `gh pr view`.
- `build-reference/app-extracted/.vite/build/index.js:464368``"gh CLI not found in PATH"` throw site that backs the missing-`gh` prompt.
- `build-reference/app-extracted/.vite/build/index.js:464480``installGh()`: macOS-only `brew install gh`; Linux/Windows return error pointing to https://cli.github.com.
- `build-reference/app-extracted/.vite/build/index.js:465019``autoMergeRequest { enabledAt }` GraphQL fragment; `enableAutoMerge` / `disableAutoMerge` at lines 465531 / 465556.
- `build-reference/app-extracted/.vite/build/index.js:534033``AutoFixEngine.handleSessionEvent` toggles on `autoFixEnabled` per session.
## T29 — Worktree isolation
**Severity:** Critical
**Surface:** Code tab → Sidebar (parallel sessions)
**Applies to:** All rows
**Issues:**
**Steps:**
1. In a Code-tab session against a Git project, open two new sessions in parallel via **+ New session**.
2. Make different edits in each session.
3. Confirm `<project-root>/.claude/worktrees/<branch>` exists for each.
4. Archive one session via the sidebar archive icon.
**Expected:** Each session creates an isolated worktree at `<project-root>/.claude/worktrees/<branch>` (or the dir configured in Settings → Claude Code → "Worktree location"). Edits in one session do not appear in another until committed. Archiving removes the worktree.
**Diagnostics on failure:** `git worktree list` from project root, `ls -la <project-root>/.claude/worktrees/`, launcher log.
**References:** [Work in parallel with sessions](https://code.claude.com/docs/en/desktop#work-in-parallel-with-sessions)
**Code anchors:**
- `build-reference/app-extracted/.vite/build/index.js:462835``getWorktreeParentDir()`: returns `<baseRepo>/.claude/worktrees`, or `<chillingSlothLocation.customPath>/<basename>` when overridden in Settings.
- `build-reference/app-extracted/.vite/build/index.js:462843``createWorktree()`: runs `git worktree add` with `core.longpaths=true` under the parent dir.
- `build-reference/app-extracted/.vite/build/index.js:463290``git worktree remove --force` invoked on archive (cleanup path).
- `build-reference/app-extracted/.vite/build/index.js:55231``chillingSlothLocation: "default"` settings key (Settings → "Worktree location").
## T30 — Auto-archive on PR merge
**Severity:** Should
**Surface:** Code tab → Sidebar
**Applies to:** All rows
**Issues:**
**Steps:**
1. In Settings → Claude Code, enable **Auto-archive on PR close** (`ccAutoArchiveOnPrClose`).
2. Open a PR from a local session. Merge or close it on GitHub.
3. Wait up to ~56 minutes (sweep runs every 5 minutes, with a 30s startup delay). Observe the sidebar.
**Expected:** Local session whose PR is `merged` or `closed` is archived from the sidebar on the next sweep tick (≤ ~5 min) after the merge/close event. Cached PR-state lookups have a 1-hour cooldown for sessions whose state isn't yet terminal. Remote and SSH sessions are not affected.
**Diagnostics on failure:** Screenshot of sidebar, `gh pr view <num>` output (confirming merge state), launcher log, settings file content (`ccAutoArchiveOnPrClose`).
**References:** [Work in parallel with sessions](https://code.claude.com/docs/en/desktop#work-in-parallel-with-sessions)
**Code anchors:**
- `build-reference/app-extracted/.vite/build/index.js:55269` — default `ccAutoArchiveOnPrClose: !1` setting.
- `build-reference/app-extracted/.vite/build/index.js:533517` — sweep cadence constants: `$3n = 300_000` ms (5 min interval), `W3n = 3_600_000` ms (1 h recheck cooldown), `Fst = 10` (concurrent batch size).
- `build-reference/app-extracted/.vite/build/index.js:533520``AutoArchiveEngine.start()` schedules the 5-min interval + 30s initial delay.
- `build-reference/app-extracted/.vite/build/index.js:533537``sweep()` gates on `Qi("ccAutoArchiveOnPrClose")` and archives sessions whose `prState` lowercases to `merged` or `closed` (`D3A` predicate at line 533607).
- `build-reference/app-extracted/.vite/build/index.js:533571``archiveSession(..., { cleanupWorktree: true })` removes the worktree alongside the archive.
## T31 — Side chat opens
**Severity:** Should
**Surface:** Code tab → Side chat overlay
**Applies to:** All rows
**Issues:**
**Steps:**
1. In a Code-tab session, press `Ctrl+;` (or type `/btw` in the prompt).
2. Ask a question in the side chat. Confirm the side chat sees the main thread context.
3. Close the side chat. Confirm focus returns to the main session and the side chat content is not in the main thread.
**Expected:** Side chat opens, has access to main-thread context, but its replies do not appear in the main conversation. Closing returns focus.
**Diagnostics on failure:** Screenshot, launcher log, DevTools console.
**References:** [Ask a side question](https://code.claude.com/docs/en/desktop#ask-a-side-question-without-derailing-the-session)
**Code anchors:**
- `build-reference/app-extracted/.vite/build/index.js:487025` — side-chat system-prompt suffix: "You are running in a side chat — a lightweight fork… nothing you say here lands in the main transcript."
- `build-reference/app-extracted/.vite/build/index.js:487265``this.sideChats = new Map()` per-session fork registry.
- `build-reference/app-extracted/.vite/build/index.js:491658``startSideChat()` implementation; emits `side_chat_ready` / `side_chat_assistant` / `side_chat_turn_end` / `side_chat_closed` / `side_chat_error` events.
- `build-reference/app-extracted/.vite/build/mainView.js:7506` — preload IPC bridges: `startSideChat`, `sendSideChatMessage`, `stopSideChat` (the renderer SPA wires `Ctrl+;` / `/btw` to these — UI lives in claude.ai's remote bundle, not build-reference).
## T32 — Slash command menu
**Severity:** Should
**Surface:** Code tab → Prompt slash menu
**Applies to:** All rows
**Issues:**
**Steps:**
1. In a Code-tab session, type `/` in the prompt box.
2. Verify built-in commands, custom skills under `~/.claude/skills/`, project skills, and skills from installed plugins all appear.
3. Select an entry — confirm it inserts as a highlighted token.
**Expected:** Slash menu lists every available command/skill. Selection inserts the token correctly.
**Diagnostics on failure:** Screenshot of slash menu, `ls ~/.claude/skills/`, project `.claude/skills/`, installed plugin manifest, launcher log.
**References:** [Use skills](https://code.claude.com/docs/en/desktop#use-skills)
**Code anchors:**
- `build-reference/app-extracted/.vite/build/index.js:459463``getSupportedCommands({sessionId})` aggregates per-session `slashCommands` + cowork command registry (`p2()`) + built-ins (`Q_t`).
- `build-reference/app-extracted/.vite/build/index.js:332711``slashCommands: Di.array(Di.string()).optional()` schema field on the session record.
- `build-reference/app-extracted/.vite/build/index.js:377670``SkillManager` constructor: `skillDir = <agentDir>/.claude/skills`, `_discoverSkills()` walks project skills.
- `build-reference/app-extracted/.vite/build/index.js:444678` — private/public skill split under `<skillsRoot>/skills/{private,public}` for plugin-supplied skills.

View File

@@ -0,0 +1,168 @@
# Distribution — DEB, RPM, AppImage
Tests covering Ubuntu/DEB-specific install behavior, Fedora/RPM-specific install behavior, AppImage fallback paths, and the auto-update interaction with system package managers. See [`../matrix.md`](../matrix.md) for status.
## S01 — AppImage launches without manual `libfuse2t64` install
**Severity:** Critical (for Ubuntu users)
**Surface:** AppImage runtime / FUSE
**Applies to:** Ubu (and any Ubuntu 24.04+ host)
**Issues:**
**Steps:**
1. Fresh Ubuntu 24.04 install with default packages only.
2. Download the project AppImage.
3. Make executable and run it.
**Expected:** AppImage runs without first installing `libfuse2t64`. Either the AppImage bundles its own FUSE shim, the `.desktop`/postinst declares the dep, or the launcher gives a clear error pointing at the package name.
**Currently:** Fails on Ubuntu 24.04 with `dlopen(): error loading libfuse.so.2`. Workaround: `sudo apt install libfuse2t64`. Not yet filed.
**Diagnostics on failure:** Full stderr from the AppImage launch, `ldd ./claude-desktop-*.AppImage`, `dpkg -l | grep -i fuse`.
**References:**
**Code anchors:** `scripts/packaging/appimage.sh:226` (downloads the upstream `appimagetool` AppImage as-is — no FUSE shim or static-mksquashfs bundling), `scripts/launcher-common.sh:64` (AppImage forces `--no-sandbox` "due to FUSE constraints"), `.github/workflows/test-artifacts.yml:47` (CI installs `libfuse2` before running the AppImage — i.e. the runtime hard-depends on libfuse2/libfuse2t64). No postinst dep declaration or user-facing FUSE error message exists.
## S02 — `XDG_CURRENT_DESKTOP=ubuntu:GNOME` doesn't break DE detection
**Severity:** Critical
**Surface:** DE detection / patch gate
**Applies to:** Ubu
**Issues:**
**Steps:**
1. On Ubuntu 24.04 (where `XDG_CURRENT_DESKTOP=ubuntu:GNOME`), launch the app.
2. Inspect launcher log for any DE-detection branches that should fire as GNOME.
3. Audit `scripts/launcher-common.sh` and any DE-gated patches for string-equality checks against `XDG_CURRENT_DESKTOP`.
**Expected:** DE-detection logic handles Ubuntu's colon-separated value. `contains "GNOME"` or splitting on `:` is the safe pattern; `== "GNOME"` would miss Ubuntu.
**Diagnostics on failure:** `echo $XDG_CURRENT_DESKTOP`, the relevant launcher.sh code path, launcher log, the patches that ran or didn't.
**References:** Surfaced via session-capture review.
**Code anchors:** `scripts/launcher-common.sh:35-44` (Niri auto-detect lowercases `XDG_CURRENT_DESKTOP` and uses `*niri*` glob — handles colon-separated values), `scripts/patches/quick-window.sh:34-35` and `:117-118` (KDE gate uses `.toLowerCase().includes("kde")` — substring, not equality), `scripts/doctor.sh:304` (purely informational `_info "Desktop: $desktop"`, no branching). No `==` equality checks against `XDG_CURRENT_DESKTOP` exist anywhere in shell or patched JS.
## S03 — DEB install via APT pulls all required runtime deps
**Severity:** Critical
**Surface:** APT repository / dependency declarations
**Applies to:** Ubu (any DEB-based distro)
**Issues:** [`docs/learnings/apt-worker-architecture.md`](../../learnings/apt-worker-architecture.md)
**Steps:**
1. Add the project's APT repo per the README install instructions.
2. `sudo apt install claude-desktop` on a fresh container/VM.
3. Run `claude-desktop` — first launch should succeed with no further package installs.
**Expected:** All transitive runtime deps are declared in the package and pulled by APT. First launch succeeds without manual `apt install` of any extra package.
**Diagnostics on failure:** `apt-cache depends claude-desktop`, missing-library errors from the launcher, `ldd` against the binary.
**References:** [`docs/learnings/apt-worker-architecture.md`](../../learnings/apt-worker-architecture.md)
**Code anchors:** `scripts/packaging/deb.sh:185-197` (DEBIAN/control file — no `Depends:` field is emitted; relies on bundled Electron + the comment "No external dependencies are required at runtime" at line 183), `scripts/packaging/deb.sh:202-230` (postinst only sets chrome-sandbox suid, no dep-pull). Worker chain serving the package: `worker/src/worker.js:22-31` (`DEB_RE`) and `:33-43` (302 → GitHub Releases).
## S04 — RPM install via DNF pulls all required runtime deps
**Severity:** Critical
**Surface:** DNF repository / dependency declarations
**Applies to:** KDE-W, KDE-X, GNOME, Sway, i3, Niri (any RPM-based distro)
**Issues:** [`docs/learnings/apt-worker-architecture.md`](../../learnings/apt-worker-architecture.md) *(covers both APT and DNF)*
**Steps:**
1. Add the project's DNF repo per the README.
2. `sudo dnf install claude-desktop` on a fresh container/VM.
3. Run `claude-desktop` — first launch should succeed.
**Expected:** All transitive runtime deps are declared in the RPM and pulled by DNF. First launch succeeds with no further package installs.
**Diagnostics on failure:** `dnf repoquery --requires claude-desktop`, `rpm -qR claude-desktop`, launcher missing-library errors.
**References:** [`docs/learnings/apt-worker-architecture.md`](../../learnings/apt-worker-architecture.md)
**Code anchors:** `scripts/packaging/rpm.sh:188` (`AutoReqProv: no` — explicitly disables RPM's auto-dep generation; spec declares no `Requires:`), `scripts/packaging/rpm.sh:194-198` (strip + build-id disabled because Electron binaries don't tolerate them — bundled approach). Worker chain: `worker/src/worker.js:28-31` (`RPM_RE`).
## S05 — Doctor recognises dnf-installed package, doesn't false-flag as AppImage
**Severity:** Should
**Surface:** Doctor package-format detection
**Applies to:** KDE-W, KDE-X, GNOME, Sway, i3, Niri
**Issues:**
**Steps:**
1. On a Fedora/Nobara/RPM-based distro with claude-desktop installed via dnf, run `claude-desktop --doctor`.
2. Look for the install-method line.
**Expected:** Doctor detects rpm install (e.g. via `rpm -qf` against the binary path) and reports it cleanly. No `not found via dpkg (AppImage?)` warning.
**Currently:** Doctor's install-method check is gated on `command -v dpkg-query`, so on RPM-only hosts (no dpkg installed) the block is skipped entirely — no install-method line is printed. On hosts that have *both* `dpkg-query` and an rpm-installed `claude-desktop` (uncommon, e.g. mixed Debian + dnf), the misleading `claude-desktop not found via dpkg (AppImage?)` WARN does fire. Either way, no `rpm -qf` branch exists. Affects KDE-W, KDE-X, GNOME, Sway, i3, Niri rows ([T13](./launch.md#t13--doctor-reports-correct-package-format)). Not yet filed.
**Diagnostics on failure:** Full `--doctor` output, `rpm -qf $(which claude-desktop)`, the doctor source line that decides the format.
**References:** [T13](./launch.md#t13--doctor-reports-correct-package-format)
**Code anchors:** `scripts/doctor.sh:353-362` — install-method check is gated on `command -v dpkg-query`; only runs on Debian-family hosts. Falls through to `_warn 'claude-desktop not found via dpkg (AppImage?)'` only if `dpkg-query` is present but returns empty. On Fedora/RPM hosts (`dpkg-query` absent), the entire block is skipped and **no install-method line is printed at all** — neither the misleading WARN nor a correct `rpm -qf` PASS. The drift is "no detection" rather than "false-flag as AppImage" on dpkg-less systems.
## S15 — AppImage extraction (`--appimage-extract`) works as documented fallback
**Severity:** Could
**Surface:** AppImage runtime / FUSE-less fallback
**Applies to:** Any AppImage row
**Issues:**
**Steps:**
1. On a host without FUSE, run `./claude-desktop-*.AppImage --appimage-extract`.
2. Inspect `squashfs-root/`.
3. Run `squashfs-root/AppRun`.
**Expected:** Extraction completes. `squashfs-root/AppRun` launches the app cleanly without FUSE.
**Diagnostics on failure:** Extraction stderr, `ls squashfs-root/`, AppRun stderr.
**References:** Linked from the runtime error message when FUSE is missing.
**Code anchors:** `scripts/packaging/appimage.sh:282` and `:312` (built with stock `appimagetool`, which always supports `--appimage-extract`), `scripts/packaging/appimage.sh:70-118` (`AppRun` script that lives at `squashfs-root/AppRun` after extraction). CI exercises this path: `tests/test-artifact-appimage.sh:36-44` and `.github/workflows/ci.yml:388` both run `--appimage-extract` and assert `squashfs-root/` exists.
## S16 — AppImage mount cleans up on app exit
**Severity:** Should
**Surface:** AppImage mount lifecycle
**Applies to:** Any AppImage row
**Issues:** [CLAUDE.md "Common Gotchas"](https://github.com/aaddrick/claude-desktop-debian/blob/main/CLAUDE.md)
**Steps:**
1. Launch the AppImage. Confirm `mount | grep claude` shows the mount.
2. Quit the app cleanly via tray → Quit (or `Ctrl+Q`).
3. Re-run `mount | grep claude` — mount should be gone.
**Expected:** AppImage's mount at `/tmp/.mount_claude*` is unmounted and the directory removed when all child Electron processes exit. Stale mounts after force-quit are handled by `pkill -9 -f "mount_claude"` per CLAUDE.md but should not be the common case.
**Diagnostics on failure:** `mount | grep claude` after exit, `ls -la /tmp/.mount_claude*`, `pgrep -af claude`, `journalctl -k -n 50` for mount errors.
**References:** [CLAUDE.md "Common Gotchas"](https://github.com/aaddrick/claude-desktop-debian/blob/main/CLAUDE.md)
**Code anchors:** Mount lifecycle is owned by upstream `appimagetool`'s runtime, not this repo — `scripts/packaging/appimage.sh:282`/`:312` invokes the stock tool with no custom AppRun-side cleanup. `CLAUDE.md:179-183` documents `pkill -9 -f "mount_claude"` as the manual recovery for stale mounts after force-quit. No project-side unmount handler exists; the test asserts upstream behavior, not ours.
## S26 — Auto-update is disabled when installed via `apt` / `dnf`
> **⚠ Missing in build 1.5354.0** — No project-side suppression of upstream auto-update exists; the launcher exports `ELECTRON_FORCE_IS_PACKAGED=true`, which causes upstream's `lii()` gate to return true on Linux and the auto-update tick loop to start. Suppression is "accidental" — it relies on Electron's built-in `autoUpdater` module being unimplemented on Linux (so `setFeedURL`/`checkForUpdates` throw, the `error` listener logs, and no download happens). Tracked at [#567](https://github.com/aaddrick/claude-desktop-debian/issues/567); re-verify after next upstream bump.
**Severity:** Critical
**Surface:** Auto-update path
**Applies to:** All DEB/RPM rows
**Issues:** [#567](https://github.com/aaddrick/claude-desktop-debian/issues/567)
**Steps:**
1. Install via APT or DNF.
2. Launch the app and let it sit for ~5 minutes.
3. Inspect launcher log + filesystem for any auto-update download attempt.
**Expected:** When installed via the project's APT or DNF repo, the in-app auto-update path is suppressed. The app does not download replacement binaries (which would race the package manager). Updates flow through `apt upgrade` / `dnf upgrade` only. AppImage installs may continue to self-update or punt to the user.
**Diagnostics on failure:** Launcher log, network captures (look for downloads from `releases.anthropic.com` or `api.anthropic.com/api/desktop/linux/...`), filesystem changes under `~/.config/Claude/`.
**References:** [`docs/learnings/apt-worker-architecture.md`](../../learnings/apt-worker-architecture.md)
**Code anchors:** `scripts/launcher-common.sh:249` (`export ELECTRON_FORCE_IS_PACKAGED=true` — makes upstream think it's installed); `build-reference/app-extracted/.vite/build/index.js:508761-508769` (upstream `lii()` returns `hA.app.isPackaged` on Linux — passes the gate); `:508554-508559` (only suppression hook is enterprise-policy `disableAutoUpdates`, no Linux/distro carve-out); `:508770-508774` (feed URL `https://api.anthropic.com/api/desktop/linux/<arch>/squirrel/update?...`); `:508800-508803` (calls `hA.autoUpdater.setFeedURL` + `.checkForUpdates()` unconditionally on Linux). No patch in `scripts/patches/*.sh` neutralizes the autoUpdater module or sets `disableAutoUpdates`. AppImage continues to ship update info: `scripts/packaging/appimage.sh:308-309` (`gh-releases-zsync` zsync metadata embedded for releases).

View File

@@ -0,0 +1,153 @@
# Extensibility — Plugins, MCP, Hooks, Memory
Tests covering the Anthropic & Partners plugin install flow, the plugin browser, MCP server config, hooks, `CLAUDE.md` memory loading, and per-user storage of plugins/worktrees. See [`../matrix.md`](../matrix.md) for status.
## T11 — Plugin install (Anthropic & Partners)
**Severity:** Smoke
**Surface:** Plugin browser → install flow
**Applies to:** All rows
**Issues:** [`docs/learnings/plugin-install.md`](../../learnings/plugin-install.md)
**Steps:**
1. In a Code-tab session, click **+** → **Plugins****Add plugin**.
2. Find an Anthropic & Partners plugin. Click **Install**.
3. Verify it lands in **Manage plugins** and its skills appear in the slash menu.
4. Re-install the same plugin to verify idempotence.
**Expected:** Install completes end-to-end: gate logic accepts, backend endpoint responds, plugin appears in the plugin list. Re-install is idempotent.
**Diagnostics on failure:** DevTools network panel during install, launcher log, `~/.claude/plugins/` content, the gate-logic code path (see learnings doc).
**References:** [`docs/learnings/plugin-install.md`](../../learnings/plugin-install.md), [Install plugins](https://code.claude.com/docs/en/desktop#install-plugins)
**Code anchors:** `build-reference/app-extracted/.vite/build/index.js:507181` (`installPlugin` IPC + gate, with `pluginSource === "remote"` branch and CLI fallback); `:507193` log `[CustomPlugins] installPlugin: attempting remote API install`; `:465816` `dx()` returns `~/.claude/plugins`; `:465822` `installed_plugins.json` (idempotency record).
**Inventory anchor:** `…customize.main.navigation.button-by-name.add-plugin` (role `button`, label `Add plugin`); sibling `…button-by-name.browse-plugins` (label `Browse plugins`). Both are persistent in the Customize panel — anchors the entry-point click chain.
## T33 — Plugin browser
**Severity:** Should
**Surface:** Plugin browser UI
**Applies to:** All rows
**Issues:**
**Steps:**
1. Click **+** → **Plugins****Add plugin**.
2. Confirm entries from the official Anthropic marketplace appear.
3. Install a non-Anthropic plugin end-to-end.
4. Verify it shows in **Manage plugins** and contributes its skills to the slash menu.
**Expected:** Plugin browser opens, shows the marketplace, install completes. Installed plugins appear under Manage plugins and contribute to the slash menu.
**Diagnostics on failure:** Screenshot of plugin browser, network captures, launcher log, `~/.claude/plugins/` listing.
**References:** [Install plugins](https://code.claude.com/docs/en/desktop#install-plugins)
**Code anchors:** `build-reference/app-extracted/.vite/build/index.js:71392` (`CustomPlugins.listMarketplaces` IPC); `:71534` (`listAvailablePlugins` IPC); `:507176` (`listMarketplaces` main-process handler); `:496236` deep-link route `plugins/new` opens the browser surface.
**Inventory anchor:** `…customize.main.navigation.button-by-name.browse-plugins` (role `button`, label `Browse plugins`); sibling `…link-by-name.connectors` (role `link`, label `Connectors`). The browser surface itself (marketplace listings, install button) appears under a child dialog not captured at idle — re-capture with the dialog open to anchor those.
## T35 — MCP server config picked up
**Severity:** Critical
**Surface:** MCP / Code tab
**Applies to:** All rows
**Issues:**
**Steps:**
1. Add an MCP server to `~/.claude.json` or `<project>/.mcp.json`.
2. Open a Code-tab session against the project.
3. Type `/` in the prompt — verify MCP-provided tools appear in the slash menu (or invoke one directly).
4. Separately, confirm `claude_desktop_config.json` (Chat-tab MCP) is **not** picked up by Code tab.
**Expected:** MCP servers in `~/.claude.json` or `.mcp.json` start when a Code session opens. Tools appear in the slash menu, calls succeed end-to-end. `claude_desktop_config.json` is separate per upstream docs.
**Diagnostics on failure:** Server stderr (MCP servers log to stderr), `~/.claude.json` and `.mcp.json` content, launcher log, DevTools console for MCP wire errors.
**References:** [MCP servers: desktop chat app vs Claude Code](https://code.claude.com/docs/en/desktop#shared-configuration), [`docs/learnings/plugin-install.md`](../../learnings/plugin-install.md)
**Code anchors:** `build-reference/app-extracted/.vite/build/index.js:215418` (Code-tab loads `<project>/.mcp.json` per scanned dir); `:176766` reads `~/.claude.json`; `:489098` Code-session passes `settingSources: ["user", "project", "local"]` to the agent SDK; `:130821` `claude_desktop_config.json` is the chat-tab path constant (separate userData dir at `:130829` `kee()`), confirming the two trees do not overlap.
## T36 — Hooks fire
**Severity:** Critical
**Surface:** Hooks runtime
**Applies to:** All rows
**Issues:**
**Steps:**
1. Add a `SessionStart` hook in `~/.claude/settings.json` that writes a marker file.
2. Open a new Code-tab session.
3. Confirm the marker file exists.
4. Repeat with `PreToolUse` / `PostToolUse` hooks. Switch transcript view to Verbose to see the hook output.
**Expected:** Hooks defined in `~/.claude/settings.json` execute at the documented points. Hook output is visible in Verbose transcript mode. A failing hook surfaces a clear error rather than silently breaking the session.
**Diagnostics on failure:** Hook script stderr, marker file presence, launcher log, settings file content, Verbose transcript output.
**References:** [Shared configuration](https://code.claude.com/docs/en/desktop#shared-configuration)
**Code anchors:** `build-reference/app-extracted/.vite/build/index.js:489098` Code-session sets `settingSources: ["user", "project", "local"]` (agent SDK reads `~/.claude/settings.json` hooks from this); `:455717` built-in `PreToolUse` hooks registry the runtime extends; `:455819` `UserPromptSubmit`; `:465680` `PostToolUse`; `:465754` `Stop`; `:493411` runtime emits `hook_started` / `hook_progress` / `hook_response` for `SessionStart` (Verbose transcript path).
## T37 — `CLAUDE.md` memory loads
**Severity:** Critical
**Surface:** Memory / Code tab session prompt
**Applies to:** All rows
**Issues:**
**Steps:**
1. Confirm a project `CLAUDE.md` exists at the working folder.
2. Confirm `~/.claude/CLAUDE.md` exists with at least one identifying token.
3. Open a Code-tab session against the project.
4. Ask Claude "what's in your CLAUDE.md" — verify the response matches on-disk content.
5. Edit `CLAUDE.md`. Start a new session — verify the new content is loaded.
**Expected:** Project `CLAUDE.md` and `CLAUDE.local.md` at the working folder, plus `~/.claude/CLAUDE.md`, are loaded into the session's system prompt. Updates after edit on the next session start.
**Diagnostics on failure:** `cat CLAUDE.md` and `cat ~/.claude/CLAUDE.md` outputs, launcher log, system-prompt dump if accessible (Verbose transcript may show it).
**References:** [Shared configuration](https://code.claude.com/docs/en/desktop#shared-configuration)
**Code anchors:** `build-reference/app-extracted/.vite/build/index.js:259691` working-dir scan reads `CLAUDE.md` and `.claude/CLAUDE.md`; `:455188` global account memory `zhA(accountId, orgId)` is copied to the per-session `.claude/CLAUDE.md` at session start (`[GlobalMemory] Copied CLAUDE.md`); `:283107` `cE()` resolves `CLAUDE_CONFIG_DIR` or `~/.claude`, the dir whose `CLAUDE.md` the agent SDK loads via `settingSources: ["user", ...]` (see T36 anchor at `:489098`).
## S27 — Plugins install per-user, not into system paths
**Severity:** Should
**Surface:** Plugin storage
**Applies to:** All rows
**Issues:**
**Steps:**
1. As a non-root user, install a plugin via the desktop plugin browser.
2. Inspect `~/.claude/plugins/` for the install.
3. Verify nothing was written under `/usr` or other system-managed trees (`find /usr -newer /tmp/marker -name '*claude*' 2>/dev/null` after `touch /tmp/marker; install plugin`).
**Expected:** Plugins land under `~/.claude/plugins/` (or the equivalent per-user dir). Never under `/usr`. Non-root install/enable/disable works without `sudo`.
**Diagnostics on failure:** `find / -name '*<plugin-name>*' 2>/dev/null`, install logs, launcher log.
**References:** [Install plugins](https://code.claude.com/docs/en/desktop#install-plugins)
**Code anchors:** `build-reference/app-extracted/.vite/build/index.js:283107` `cE()` resolves the config root to `CLAUDE_CONFIG_DIR` or `~/.claude` — never `/usr`; `:465815` `dx()` returns `<cE()>/plugins`; `:465821`/`:465824`/`:465827` `installed_plugins.json`, `known_marketplaces.json`, `marketplaces/` all sit under `dx()`. No system-path writes in the install path.
## S28 — Worktree creation surfaces clear error on read-only mounts
**Severity:** Could
**Surface:** Worktree creation on read-only filesystem
**Applies to:** All rows (NixOS users hit this most often)
**Issues:**
**Steps:**
1. Place a project on a read-only mount (e.g. squashfs, NFS read-only export, `mount -o ro` bind).
2. Open a Code-tab session against it.
3. Try to start a parallel session that needs a worktree.
**Expected:** Worktree creation fails with a clear error pointing at the read-only mount. No silent loss of work, no writes to a wrong directory, no parent-repo corruption.
**Diagnostics on failure:** `mount | grep <project-path>`, `git worktree add` direct invocation (does it fail the same way?), launcher log, screenshot of error dialog.
**References:** [Work in parallel with sessions](https://code.claude.com/docs/en/desktop#work-in-parallel-with-sessions)
**Code anchors:** `build-reference/app-extracted/.vite/build/index.js:462841` worktree parent dir is `<repo>/.claude/worktrees` (or `chillingSlothLocation.customPath` override at `:462836`); `:462928` `git worktree add` failure path returns `null` after `R.error("Failed to create git worktree: …")`; `:462760` `Sbn()` classifies "Permission denied" / "Access is denied" / "could not lock config file" as `"permission-denied"` (the read-only-mount taxonomy bucket).

View File

@@ -0,0 +1,77 @@
# Launch & Process Lifecycle
Tests covering app startup, the `--doctor` health check, package-format detection, and multi-instance behavior. See [`../matrix.md`](../matrix.md) for status.
## T01 — App launch
**Severity:** Smoke
**Surface:** App startup
**Applies to:** All rows
**Issues:**
**Runner:** [`tools/test-harness/src/runners/T01_app_launch.spec.ts`](../../../tools/test-harness/src/runners/T01_app_launch.spec.ts)
**Steps:**
1. From a clean session, run `claude-desktop` (deb/rpm) or launch the AppImage.
2. Wait up to 10 seconds.
**Expected:** Main window opens within ~10s. No error toast, no crash. The launcher log at `~/.cache/claude-desktop-debian/launcher.log` shows the expected backend selection (`Using X11 backend via XWayland` on Wayland sessions, or native Wayland when forced).
**Diagnostics on failure:** Launcher log, `--doctor` output, session env (`XDG_SESSION_TYPE`, `XDG_CURRENT_DESKTOP`), `dmesg | tail -50`, any crash report under `~/.config/Claude/logs/`.
**References:**
**Code anchors:** `scripts/launcher-common.sh:98` (X11-via-XWayland log line), `scripts/launcher-common.sh:102` (native-Wayland log line), `build-reference/app-extracted/.vite/build/index.js:524875` (`app.on("ready")` registration), `build-reference/app-extracted/.vite/build/index.js:524881-524931` (main `BrowserWindow` factory `Ori()``titleBarStyle`, mainWindow.js preload, initial `show`).
## T02 — Doctor health check
**Severity:** Critical
**Surface:** CLI / `--doctor`
**Applies to:** All rows
**Issues:** [PR #538](https://github.com/aaddrick/claude-desktop-debian/pull/538)
**Steps:**
1. Run `claude-desktop --doctor`.
2. Inspect exit code (`echo $?`) and stdout/stderr.
**Expected:** Exits 0. All checks PASS or report expected WARN. No FAIL checks. Doctor currently reports display-server, menu-bar mode, Electron path/version, Chrome sandbox perms, SingletonLock, MCP config, Node.js, desktop entry, disk space, and a Cowork section — it does **not** surface the resolved titlebar style. See also [T13](#t13--doctor-reports-correct-package-format) for the package-format detection slice.
**Diagnostics on failure:** Full `--doctor` output, the install path being inspected (`which claude-desktop`), package metadata (`dpkg -S` / `rpm -qf` against the binary).
**References:** [PR #538](https://github.com/aaddrick/claude-desktop-debian/pull/538)
**Code anchors:** `scripts/doctor.sh:280` (`run_doctor` entry point), `scripts/doctor.sh:301-319` (display-server check), `scripts/doctor.sh:401-417` (SingletonLock check), `scripts/doctor.sh:744-753` (exit-code summary).
## T13 — Doctor reports correct package format
**Severity:** Should
**Surface:** CLI / `--doctor`
**Applies to:** All rows (currently `✗` on every Fedora row — see [S05](./distribution.md#s05--doctor-recognises-dnf-installed-package-doesnt-false-flag-as-appimage))
**Issues:***(no issue filed; surfaced via session-capture review)*
**Steps:**
1. Install via the relevant package manager (`apt` / `dnf`) or AppImage.
2. Run `claude-desktop --doctor` and look for the install-method line.
**Expected:** Doctor identifies the install method correctly. On RPM-based distros (Fedora, Nobara) it does **not** report `not found via dpkg (AppImage?)` — that warning currently false-flags every dnf install. On DEB-based distros it does not assume AppImage when dpkg returns the package metadata.
**Diagnostics on failure:** `dpkg -S $(which claude-desktop)`, `rpm -qf $(which claude-desktop)`, full `--doctor` output, the line of doctor source that decides the format.
**References:** [S05](./distribution.md#s05--doctor-recognises-dnf-installed-package-doesnt-false-flag-as-appimage)
**Code anchors:** `scripts/doctor.sh:353-362` — version probe is dpkg-only (`dpkg-query -W -f='${Version}' claude-desktop`); on RPM/AppImage hosts that lack `dpkg-query` the block is skipped, but on a Fedora host that *does* have `dpkg-query` installed (e.g. for cross-distro tooling) the `_warn 'claude-desktop not found via dpkg (AppImage?)'` branch fires for any dnf-installed copy. There is no corresponding `rpm -qf` / `rpm -q claude-desktop` branch.
## T14 — Multi-instance behavior
**Severity:** Critical
**Surface:** App lifecycle
**Applies to:** All rows
**Issues:** [PR #536](https://github.com/aaddrick/claude-desktop-debian/pull/536) (closed, docs-only — no in-tree opt-in flag)
**Steps:**
1. Launch `claude-desktop`. Wait for the main window.
2. Launch `claude-desktop` again from another terminal or `.desktop` invocation.
3. Optionally: follow the manual `--user-data-dir` recipe sketched in PR #536 (separate Electron `userData` per profile so each gets its own `SingletonLock` — note the PR was closed, the recipe is not shipped in-tree).
**Expected:** Second invocation focuses the existing window — no new process. The launcher's `cleanup_stale_lock` removes a `SingletonLock` whose owning PID is no longer running. With separate `--user-data-dir` per profile (manual workaround, not an in-tree feature), each profile runs an independent Electron instance.
**Diagnostics on failure:** `pgrep -af claude-desktop`, `ls -la ~/.config/Claude/SingletonLock`, launcher log, any "another instance is running" dialog text.
**References:** [PR #536](https://github.com/aaddrick/claude-desktop-debian/pull/536)
**Code anchors:** `build-reference/app-extracted/.vite/build/index.js:525162-525173` (`requestSingleInstanceLock()` + `app.on("second-instance", ...)` — shows existing window, restores if minimized, focuses), `build-reference/app-extracted/.vite/build/index.js:525204-525207` (early-return on lost lock at `app.on("ready")`), `scripts/launcher-common.sh:187-208` (`cleanup_stale_lock` — drops a `SingletonLock` symlink whose `hostname-PID` target points at a dead PID).

View File

@@ -0,0 +1,282 @@
# Platform Integration
Tests covering autostart, Cowork integration, WebGL graceful degradation, `.desktop`-launch env inheritance, encrypted env-var storage, the macOS/Windows-only Computer Use feature, and Dispatch session pairing. See [`../matrix.md`](../matrix.md) for status.
## T09 — AutoStart via XDG
**Severity:** Critical
**Surface:** XDG Autostart
**Applies to:** All rows
**Issues:** [PR #450](https://github.com/aaddrick/claude-desktop-debian/pull/450)
**Steps:**
1. In Settings, toggle "Open at Login" / "Start at boot" ON.
2. Inspect `~/.config/autostart/` for a `.desktop` entry.
3. Logout/login. Verify app launches automatically.
4. Toggle OFF. Verify the autostart entry is removed.
**Expected:** Toggling ON creates a `~/.config/autostart/*.desktop` entry that is XDG-spec compliant (not a custom systemd unit or shell hook). After login, app launches automatically. Toggling OFF removes the entry.
**Diagnostics on failure:** `ls -la ~/.config/autostart/`, content of the .desktop file, `desktop-file-validate` on it, launcher log.
**References:** [PR #450](https://github.com/aaddrick/claude-desktop-debian/pull/450)
**Code anchors:**
- `scripts/frame-fix-wrapper.js:376` — XDG Autostart shim
intercepting `app.{get,set}LoginItemSettings` (writes/removes
`$XDG_CONFIG_HOME/autostart/claude-desktop.desktop`).
- `scripts/frame-fix-wrapper.js:429``buildAutostartContent()`
emits the spec-compliant `[Desktop Entry]` block.
- `build-reference/app-extracted/.vite/build/index.js:524205`
upstream `isStartupOnLoginEnabled` / `setStartupOnLoginEnabled` IPC
surface that the wrapper interposes on.
## T10 — Cowork integration
**Severity:** Should
**Surface:** Cowork tab + VM daemon
**Applies to:** All rows
**Issues:** [`docs/learnings/cowork-vm-daemon.md`](../../learnings/cowork-vm-daemon.md)
**Steps:**
1. Sign into the app. Open the Cowork tab.
2. Confirm Cowork-specific UI renders (ghost icon in topbar, Cowork menus).
3. Trigger a Cowork action that needs the VM daemon.
4. Kill the VM daemon process; verify it respawns within the documented timeout.
**Expected:** Cowork features render. VM daemon spawns when needed, files are visible, daemon respawns within the documented timeout if it crashes.
**Diagnostics on failure:** `pgrep -af cowork`, daemon logs, launcher log, the respawn-logic code path (see learnings doc).
**References:** [`docs/learnings/cowork-vm-daemon.md`](../../learnings/cowork-vm-daemon.md)
**Code anchors:**
- `build-reference/app-extracted/.vite/build/index.js:143371`
upstream's Windows named-pipe path (`\\.\pipe\cowork-vm-service`)
that `scripts/patches/cowork.sh` Patch 1 rewrites to
`$XDG_RUNTIME_DIR/cowork-vm-service.sock`.
- `build-reference/app-extracted/.vite/build/index.js:143453`
`kUe()` retry loop (5 attempts, 1 s gap) that the auto-launch
injection from Patch 6 piggybacks on after the rewrite.
- `scripts/patches/cowork.sh:244` — Patch 6 (auto-launch + stdio
pipe + 10 s rate-limited respawn — issue #408).
- `scripts/patches/cowork.sh:365` — Patch 6b (extends the
reinstall-delete list with `sessiondata.img` / `rootfs.img.zst`
so a wedged daemon can self-recover).
## T12 — WebGL warn-only
**Severity:** Could
**Surface:** Chromium GPU diagnostics
**Applies to:** All rows (especially VM rows and hybrid-GPU laptops)
**Issues:**
**Steps:**
1. Launch the app. Open DevTools → navigate to `chrome://gpu`.
2. Inspect WebGL1/WebGL2 status.
3. Use the app for ~5 minutes — exercise UI, sidebar, settings.
**Expected:** WebGL1/2 may report as blocklisted (typical on virtio-gpu in VMs and on hybrid GPU laptops). This is informational. UI continues to render without graphical glitches; no feature is broken by the blocklist.
**Diagnostics on failure:** `chrome://gpu` full content, screenshot of any visual glitch, `glxinfo | head -20` (X11) or `eglinfo` (Wayland), `lspci -k | grep -A2 VGA`.
**References:**
**Code anchors:**
- `build-reference/app-extracted/.vite/build/index.js:524809`
`app.disableHardwareAcceleration()` is gated on the user-toggleable
`isHardwareAccelerationDisabled` setting; upstream does not pass
`--ignore-gpu-blocklist` or `--use-gl=*`, so chrome://gpu reflects
Chromium's stock blocklist behaviour.
- `build-reference/app-extracted/.vite/build/index.js:500571`
the only `webgl:!1` override is scoped to the feedback popup
(`in-memory-feedback` partition); main UI does not disable WebGL.
## S17 — App launched from `.desktop` inherits shell `PATH`
**Severity:** Critical
**Surface:** `.desktop`-launch env handling
**Applies to:** All rows
**Issues:**
**Steps:**
1. Configure `~/.bashrc` (or `~/.zshrc`) with `export PATH="$HOME/.custom-bin:$PATH"` and a custom binary in that dir.
2. Launch the app via dmenu/krunner/GNOME Activities/Plasma launcher (i.e. **not** from a terminal).
3. Open a Code-tab terminal pane. Run `which <custom-binary>`.
4. Repeat for `npm`, `node`, `git`, `gh`.
**Expected:** Code session can find tools defined in the user's shell profile, even when the app was launched non-interactively. Either the launcher script sources the user's shell profile, or the app reads `~/.bashrc` / `~/.zshrc` to extract `PATH` the way macOS does.
**Diagnostics on failure:** `echo $PATH` from inside the integrated terminal, the env passed to the app process (`cat /proc/$(pgrep -f electron)/environ | tr '\0' '\n' | grep PATH`), launcher log.
**References:** [Local sessions](https://code.claude.com/docs/en/desktop#local-sessions), [Session not finding installed tools](https://code.claude.com/docs/en/desktop#session-not-finding-installed-tools)
**Code anchors:**
- `build-reference/app-extracted/.vite/build/index.js:259300`
`SLr()` resolves the bundled `shell-path-worker/shellPathWorker.js`.
- `build-reference/app-extracted/.vite/build/index.js:259349`
`NLr()` forks it via `utilityProcess.fork`; on success
`FX()` (line 259311) merges the extracted env into `process.env`.
- `build-reference/app-extracted/.vite/build/shell-path-worker/shellPathWorker.js:205`
`extractPathFromShell()` runs the user's login shell (`-l -i`)
and parses the printed `$PATH` between sentinels (mac-style env
inheritance now applied on Linux too).
## S18 — Local environment editor persists across reboot
**Severity:** Should
**Surface:** Local env editor / encrypted store
**Applies to:** All rows
**Issues:**
**Steps:**
1. Open the local environment editor. Add `TEST_VAR=hello`.
2. Restart the app — verify variable is still there.
3. Reboot the host. Sign back in. Verify variable is still there.
**Expected:** Variables saved via the local environment editor (per-app, encrypted) survive a logout/login cycle and a full reboot. On Linux this implies the encrypted store is wired to libsecret / kwallet / gnome-keyring and unlocks at session start.
**Diagnostics on failure:** `secret-tool search` (libsecret), `kwallet5-query` (KDE), `seahorse` UI inspection (GNOME), launcher log, the env-editor IPC call.
**References:** [Local sessions](https://code.claude.com/docs/en/desktop#local-sessions)
**Code anchors:**
- `build-reference/app-extracted/.vite/build/index.js:259251`
`I2t = new K_({ name: "ccd-environment-config", ... })` electron-store
backing file (`~/.config/Claude/ccd-environment-config.json`).
- `build-reference/app-extracted/.vite/build/index.js:259253`
`hLr()` writes via `safeStorage.encryptString` (libsecret on Linux).
- `build-reference/app-extracted/.vite/build/index.js:259268`
`J1()` decrypts on read; bails to `{}` if `safeStorage` reports
encryption unavailable (no keyring backend running).
- `build-reference/app-extracted/.vite/build/index.js:70782`
`LocalSessionEnvironment.save` IPC entry that calls into `hLr`.
## S22 — Computer-use toggle is absent or visibly disabled on Linux
**Severity:** Should
**Surface:** Settings → Desktop app → General
**Applies to:** All rows
**Issues:**
**Steps:**
1. Open Settings → Desktop app → General.
2. Look for the "Computer use" toggle.
**Expected:** Toggle either does not render on Linux, or renders as a disabled control with a clear "not supported on Linux" hint. Must not appear functional and silently fail (e.g. flip on but never produce screen-control behavior).
**Diagnostics on failure:** Screenshot of the Settings page, DevTools inspection of the toggle DOM (is it conditionally hidden? disabled? always-rendered?), launcher log.
**References:** [Let Claude use your computer](https://code.claude.com/docs/en/desktop#let-claude-use-your-computer), [Dispatch and computer use](https://claude.com/blog/dispatch-and-computer-use)
**Code anchors:**
- `build-reference/app-extracted/.vite/build/index.js:240557`
`qDA = new Set(["darwin", "win32"])` excludes Linux from the
computer-use platform set.
- `build-reference/app-extracted/.vite/build/index.js:241190`
`TF()` (the master enable check) short-circuits to `false` when
`qDA.has(process.platform)` is false, so toggling
`chicagoEnabled` on Linux can't activate the feature.
- `build-reference/app-extracted/.vite/build/index.js:242387`
`tvr()` returns `{ status: "unsupported", reason: "Computer use
is not available on this platform", unsupportedCode:
"unsupported_platform" }` for the Settings UI — confirms the
toggle should render with a platform-unavailable hint, not silent
failure.
## S23 — Dispatch-spawned sessions don't soft-lock on a never-approvable computer-use prompt
**Severity:** Critical (for Dispatch users)
**Surface:** Dispatch session lifecycle on Linux
**Applies to:** All rows with Dispatch enabled
**Issues:**
**Steps:**
1. From a paired phone, dispatch a task that would invoke computer use.
2. Observe the Code-tab session that spawns on the desktop.
3. Try to interact with other parts of the app.
**Expected:** Permission prompt times out or denies cleanly rather than hanging the session indefinitely. User can continue interacting with the rest of the app.
**Diagnostics on failure:** Screenshot of session state, launcher log, sidebar state (is the Dispatch session blocking the whole sidebar?), `pgrep -af claude`.
**References:** [Sessions from Dispatch](https://code.claude.com/docs/en/desktop#sessions-from-dispatch)
**Code anchors:**
- `build-reference/app-extracted/.vite/build/index.js:512789`
`tool_permission_request` notification handler explicitly skips
`toolName.startsWith("computer:")`, so the desktop never queues a
user-facing prompt for computer-use tool calls (which couldn't run
on Linux anyway — see S22).
- `build-reference/app-extracted/.vite/build/index.js:241190`
`TF()` gates computer-use execution off entirely on Linux, so a
Dispatch-spawned session that requests it should hit the upstream
"Set up computer use" remote-client setup card
(`index.js:330114`) rather than block on a desktop prompt.
## S24 — Dispatch-spawned Code session appears with badge and notification
**Severity:** Critical
**Surface:** Dispatch handoff
**Applies to:** All rows with Dispatch enabled
**Issues:**
**Steps:**
1. From a paired phone, dispatch a task that routes to Code (e.g. "fix this bug").
2. Observe the desktop sidebar.
3. Confirm a desktop notification fires.
4. Open the session and confirm 30-min approval expiry per upstream docs.
**Expected:** Dispatch task creates a sidebar entry tagged **Dispatch**, posts a desktop notification, and lands ready for review. App-permission approvals on this session expire after 30 minutes per upstream docs.
**Diagnostics on failure:** Screenshot of sidebar (badge present?), notification daemon state, launcher log, the Dispatch pairing config under `~/.config/Claude/`.
**References:** [Sessions from Dispatch](https://code.claude.com/docs/en/desktop#sessions-from-dispatch), [Dispatch and computer use](https://claude.com/blog/dispatch-and-computer-use)
**Code anchors:**
- `build-reference/app-extracted/.vite/build/index.js:144561`
`Sd = "dispatch_child"` session-type constant.
- `build-reference/app-extracted/.vite/build/index.js:512200`
`onRemoteSessionStart` IPC routes a Dispatch-initiated child
session into the local sidebar via `dispatchOnRemoteSessionStart`.
- `build-reference/app-extracted/.vite/build/index.js:285621`
`notifyDispatchParentIfNeeded()` posts the
`Task "<title>" <state>` meta-notification when the dispatch
child finishes (lands the result in the parent thread's
notification queue).
- `build-reference/app-extracted/.vite/build/index.js:285954`
`kind:"dispatch_child"` is the sidebar badge tag.
## S25 — Mobile pairing survives Linux session restart
**Severity:** Should
**Surface:** Dispatch pairing persistence
**Applies to:** All rows with Dispatch enabled
**Issues:**
**Steps:**
1. Pair the desktop with a phone.
2. Quit the app fully. Re-launch.
3. Try a Dispatch task. Verify pairing still works without re-pairing.
4. Logout/login the desktop. Re-test.
**Expected:** Pairing remains active across app restart and logout/login. Pairing token is stored under `~/.config/Claude/` (or wherever the secure store lives) and survives.
**Diagnostics on failure:** `ls -la ~/.config/Claude/`, secret-store inspection, launcher log, pairing-flow IPC.
**References:** [Sessions from Dispatch](https://code.claude.com/docs/en/desktop#sessions-from-dispatch)
**Code anchors:**
- `build-reference/app-extracted/.vite/build/index.js:511984`
`ZEe = "coworkTrustedDeviceToken"` electron-store key for the
trusted-device token.
- `build-reference/app-extracted/.vite/build/index.js:511989`
`oYn()` writes the token via `safeStorage.encryptString` (libsecret
on Linux); `aYn()` (`:512003`) decrypts on read.
- `build-reference/app-extracted/.vite/build/index.js:512022`
`gYn()` re-enrolls via `POST /api/auth/trusted_devices` only when
there's no cached token, so a successful pair survives restart.
- `build-reference/app-extracted/.vite/build/index.js:330229`
`_5r = "bridge-state.json"` (per-org/account bridge state under
`~/.config/Claude/bridge-state.json`); `JF()`/`X0A()` at `:330230`
read/locate it.

View File

@@ -0,0 +1,125 @@
# Routines & Scheduled Tasks
Tests covering the Routines page, scheduled task firing, catch-up runs after suspend, and the suspend-inhibit toggle. See [`../matrix.md`](../matrix.md) for status.
## T26 — Routines page renders
**Severity:** Critical
**Surface:** Routines page
**Applies to:** All rows
**Issues:**
**Steps:**
1. Sign into the app, open the Code tab.
2. Click **Routines** in the sidebar.
3. Click **New routine****Local**.
**Expected:** Routines list opens. New-routine form shows all schedule presets (Manual, Hourly, Daily, Weekdays, Weekly), permission-mode picker, model picker, working-folder picker, and worktree toggle.
**Diagnostics on failure:** Screenshot of the Routines page (or the failure state), DevTools console output, launcher log, network captures of the routines API call (`mitmproxy` or DevTools network panel).
**References:** [Schedule recurring tasks](https://code.claude.com/docs/en/desktop-scheduled-tasks)
**Code anchors:** `build-reference/app-extracted/.vite/build/index.js:507710` (create payload — `permissionMode`, `model`, `userSelectedFolders`, `useWorktree`, `cronExpression`, `fireAt`); `build-reference/app-extracted/.vite/build/index.js:280299` (`@hourly: "0 * * * *"` preset)
**Inventory anchors:** `root.complementary.button-by-name.routines` (sidebar entry); `root.complementary.button-by-name.routines.main.region.button-by-name.new-routine` (form trigger); siblings `…button-by-name.all`, `…button-by-name.calendar` (list-view tabs). Preset list (Hourly/Daily/etc.) lives inside the New-routine modal and is not in the idle-state inventory — re-capture with the modal open to anchor.
## T27 — Scheduled task fires and notifies
**Severity:** Critical
**Surface:** Routines runtime + libnotify
**Applies to:** All rows
**Issues:**
**Steps:**
1. Create a Manual task with a simple instruction (e.g. "echo hello").
2. Click **Run now**. Observe.
3. Optionally: create an Hourly task and verify across the next hour boundary.
**Expected:** A fresh session starts, appears in the **Scheduled** section of the sidebar, and posts a desktop notification when it begins. Subsequent runs respect the deterministic offset described in upstream docs.
**Diagnostics on failure:** Launcher log, screenshot of sidebar, `gdbus call --session --dest=org.freedesktop.Notifications --object-path=/org/freedesktop/Notifications --method=org.freedesktop.DBus.Introspectable.Introspect` (verify daemon present), task SKILL.md content under `~/.claude/scheduled-tasks/<task-name>/`.
**References:** [How scheduled tasks run](https://code.claude.com/docs/en/desktop-scheduled-tasks#how-scheduled-tasks-run)
**Code anchors:** `build-reference/app-extracted/.vite/build/index.js:282332` (`runNow(A)` — manual dispatch); `build-reference/app-extracted/.vite/build/index.js:512837` (`Rc.showNotification(...,scheduled-${l},...)` — desktop notification on completion); `build-reference/app-extracted/.vite/build/index.js:282654` (`getJitterSecondsForTask` — deterministic per-task offset via `v2r(A, n*60)`, capped by `dispatchJitterMaxMinutes` default 10)
## T28 — Scheduled task catch-up after suspend
**Severity:** Should
**Surface:** Routines runtime / wake-from-suspend
**Applies to:** All rows
**Issues:**
**Steps:**
1. Create an Hourly task.
2. Suspend the host (`systemctl suspend`).
3. Wait past at least one hourly slot. Wake the host.
4. Observe whether a catch-up run starts.
**Expected:** Exactly one catch-up run for the most recently missed slot (older missed slots are discarded). Notification announces the catch-up. Missed runs older than seven days are not retried.
**Diagnostics on failure:** Task history in the routines detail page, launcher log, `journalctl --since="-1 day" | grep -i suspend`.
**References:** [Missed runs](https://code.claude.com/docs/en/desktop-scheduled-tasks#missed-runs)
**Code anchors:** `build-reference/app-extracted/.vite/build/index.js:281695` (`R2r` — walks back from now, capped at `10080 * 60 * 1e3` ms = 7 days, returns at most one missed slot, dedupes by `IfA` bucket-key); `build-reference/app-extracted/.vite/build/index.js:281942` (`scheduledTaskPostWakeDelayMs` default 60000 ms — gates dispatch after `powerMonitor.on("resume")`); `build-reference/app-extracted/.vite/build/index.js:282569` (catch-up branch: `c ? 0 : this.getJitterSecondsForTask(o.id)` — missed-slot dispatch skips jitter)
## S19 — `CLAUDE_CONFIG_DIR` redirects scheduled-task storage
**Severity:** Could
**Surface:** Config dir env var
**Applies to:** All rows
**Issues:**
**Steps:**
1. In the local environment editor, set `CLAUDE_CONFIG_DIR=/some/other/path`.
2. Restart the app.
3. Create a scheduled task. Inspect filesystem.
**Expected:** Tasks resolve under `${CLAUDE_CONFIG_DIR}/scheduled-tasks/<task-name>/SKILL.md` rather than `~/.claude/scheduled-tasks/`. Pre-existing tasks under the old path are not silently dropped.
**Diagnostics on failure:** `ls -la ${CLAUDE_CONFIG_DIR}/scheduled-tasks/` and `~/.claude/scheduled-tasks/`, launcher log, env dump.
**References:** [Manage scheduled tasks](https://code.claude.com/docs/en/desktop-scheduled-tasks#manage-scheduled-tasks)
**Code anchors:** `build-reference/app-extracted/.vite/build/index.js:283108` (`cE()` — resolves `process.env.CLAUDE_CONFIG_DIR ?? ~/.claude`, handles `~` prefix); `build-reference/app-extracted/.vite/build/index.js:283118` (`Tce()` — returns `${cE()}/scheduled-tasks`); `build-reference/app-extracted/.vite/build/index.js:488317` and `:509032` (call sites passing `taskFilesDir: Tce()` into the scheduled-tasks substrate)
## S20 — "Keep computer awake" inhibits idle suspend
**Severity:** Should
**Surface:** Suspend inhibitor
**Applies to:** All rows
**Issues:**
**Steps:**
1. Open Settings → Desktop app → General → "Keep computer awake". Toggle ON.
2. Run `systemd-inhibit --list`. Look for a Claude-owned lock with `idle:sleep` what.
3. Toggle OFF. Re-run `systemd-inhibit --list` — lock should be gone.
**Expected:** Toggling ON registers `systemd-inhibit --what=idle:sleep` (or the `org.freedesktop.PowerManagement.Inhibit` DBus call). Toggling OFF releases the lock.
**Diagnostics on failure:** `systemd-inhibit --list` before/after, `busctl --user tree org.freedesktop.PowerManagement` (if the path uses that backend), launcher log, the relevant settings IPC call.
**References:** [How scheduled tasks run](https://code.claude.com/docs/en/desktop-scheduled-tasks#how-scheduled-tasks-run)
**Code anchors:** `build-reference/app-extracted/.vite/build/index.js:241897` (`hA.powerSaveBlocker.start("prevent-app-suspension")` — single block call, ref-counted by `PhA` Set); `build-reference/app-extracted/.vite/build/index.js:241905` (`hA.powerSaveBlocker.stop(BP)` when last claim drops); `build-reference/app-extracted/.vite/build/index.js:241909` (settings binding: `PHe = "keepAwakeEnabled"`); `build-reference/app-extracted/.vite/build/index.js:241914` (`vy.on("keepAwakeEnabled", YHe)` — toggle observer)
## S21 — Lid-close still suspends per OS policy
**Severity:** Critical
**Surface:** Suspend inhibitor scope
**Applies to:** All rows (laptop hosts)
**Issues:**
**Steps:**
1. With "Keep computer awake" ON, close the laptop lid.
2. Observe whether the machine suspends.
**Expected:** Machine still suspends per logind's `HandleLidSwitch=suspend`. The inhibit lock taken in [S20](#s20--keep-computer-awake-inhibits-idle-suspend) targets `idle:sleep`, not `handle-lid-switch`, so lid-close behavior is unaffected.
**Diagnostics on failure:** `loginctl show-session --property=HandleLidSwitch`, `journalctl --since="-5 minutes"`, the actual `--what=` flags on the Claude-owned inhibitor.
**References:** [How scheduled tasks run](https://code.claude.com/docs/en/desktop-scheduled-tasks#how-scheduled-tasks-run)
**Code anchors:** `build-reference/app-extracted/.vite/build/index.js:241897` (only `"prevent-app-suspension"` is passed to `powerSaveBlocker.start` — Electron maps this to `idle:sleep`); no `handle-lid-switch` / `HandleLidSwitch` token anywhere in `index.js` (verified via `grep -nE 'lid|HandleLidSwitch|handle-lid' index.js`)

View File

@@ -0,0 +1,365 @@
# Shortcuts & Input
Tests covering URL handling, the Quick Entry global shortcut, and DE-specific shortcut/input failure modes. See [`../matrix.md`](../matrix.md) for status.
## T05 — `claude://` URL handler opens links in-app
**Severity:** Smoke
**Surface:** URL handler / xdg-open
**Applies to:** All rows
**Issues:**
**Steps:**
1. With Claude Desktop running, in another app run `xdg-open 'claude://chat/new?q=hello'` (or click a `claude://` link in a browser/terminal).
2. Observe.
**Expected:** Link is delivered to the running Claude Desktop process — no new browser tab, no crash, no error dialog. (Upstream's `claudeURLHandler` only accepts the `claude:`, `claude-dev:`, `claude-nest:`, `claude-nest-dev:`, `claude-nest-prod:` schemes; bare `https://claude.ai/...` clicks route through the user's default browser, not Claude Desktop. The `.desktop` file registers `MimeType=x-scheme-handler/claude` only, matching the upstream contract.)
**Diagnostics on failure:** `xdg-mime query default x-scheme-handler/claude`, the registered `.desktop` file content, launcher log, app crash report (if any), `coredumpctl list claude-desktop` (if subprocess died — see [S06](#s06--url-handler-doesnt-segfault-on-native-wayland)).
**References:** upstream `index.js:495996-496009` (`bEe()` protocol filter), `index.js:524819` (`setAsDefaultProtocolClient("claude")`), `index.js:525140-525148` (macOS `open-url`), `index.js:525162-525172` (Linux/Win `second-instance` argv path), project `scripts/packaging/{deb,rpm,appimage}.sh` (MimeType registration).
**Code anchors:** build-reference/app-extracted/.vite/build/index.js:495996, 524819, 525140, 525162
## T06 — Quick Entry global shortcut (unfocused)
**Severity:** Critical
**Surface:** Global shortcut / Electron globalShortcut
**Applies to:** All rows
**Issues:** [#404](https://github.com/aaddrick/claude-desktop-debian/issues/404), [#393](https://github.com/aaddrick/claude-desktop-debian/issues/393), [PR #406](https://github.com/aaddrick/claude-desktop-debian/pull/406), [PR #102](https://github.com/aaddrick/claude-desktop-debian/pull/102), [PR #153](https://github.com/aaddrick/claude-desktop-debian/pull/153)
**Steps:**
1. Launch app, focus another application (browser, terminal).
2. Press the configured Quick Entry shortcut (default `Ctrl+Alt+Space`).
3. Type a prompt and submit.
4. Repeat from a different virtual desktop / workspace.
**Expected:** Quick Entry prompt opens regardless of focused app or workspace. Shortcut is globally registered, not focus-bound. Submitting creates a new session and shows it in the main window.
**Diagnostics on failure:** Launcher log (look for `Using X11 backend via XWayland (for global hotkey support)` or portal-shortcut markers), `XDG_SESSION_TYPE`, `XDG_CURRENT_DESKTOP`, output of `gdbus call --session --dest=org.freedesktop.portal.Desktop --object-path=/org/freedesktop/portal/desktop --method=org.freedesktop.DBus.Introspectable.Introspect`, the active patch set in `scripts/patches/`.
**References:** [#404](https://github.com/aaddrick/claude-desktop-debian/issues/404), [#393](https://github.com/aaddrick/claude-desktop-debian/issues/393), [PR #406](https://github.com/aaddrick/claude-desktop-debian/pull/406)
**Code anchors:** build-reference/app-extracted/.vite/build/index.js:499376 (`ort` default accelerator: `"Ctrl+Alt+Space"` non-mac, `"Alt+Space"` on mac), 499416 (`globalShortcut.register`), 525287-525290 (Quick Entry trigger callback registered against `Pw.QUICK_ENTRY`).
## S06 — URL handler doesn't segfault on native Wayland
**Severity:** Critical (for wlroots rows)
**Surface:** URL handler subprocess
**Applies to:** Sway, Niri, Hypr-O, Hypr-N (any native-Wayland session)
**Issues:**
**Steps:**
1. Launch the app on a native Wayland session (no XWayland forcing).
2. From another app, click a `claude.ai` link or run `xdg-open https://claude.ai/...`.
**Expected:** Link opens in-app cleanly. No `Failed to connect to Wayland display` errors followed by a SIGSEGV from the URL handler subprocess.
**Diagnostics on failure:** `coredumpctl info claude-desktop`, `WAYLAND_DISPLAY` env in the subprocess (if capturable via `strace -f -e execve`), launcher log, full env dump.
**Currently:** Sway capture shows `Failed to connect to Wayland display: No such file or directory (2)` followed by `Segmentation fault` from the URL handler subprocess. The main app process keeps running; the URL handler dies. Not yet filed.
**References:**
**Code anchors:** build-reference/app-extracted/.vite/build/index.js:495996 (`bEe()` URL handler), 525140-525148 (`open-url` macOS), 525162-525172 (`second-instance` argv path on Linux); project `scripts/launcher-common.sh:96-99` (`--ozone-platform=x11` default), `scripts/launcher-common.sh:41-44` (Niri force-native-Wayland).
## S07 — `CLAUDE_USE_WAYLAND=1` opt-in path works without crashing
**Severity:** Should
**Surface:** Native Wayland mode
**Applies to:** Sway, Niri, Hypr-O, Hypr-N
**Issues:** [PR #228](https://github.com/aaddrick/claude-desktop-debian/pull/228), [PR #232](https://github.com/aaddrick/claude-desktop-debian/pull/232)
**Steps:**
1. Set `CLAUDE_USE_WAYLAND=1`. Launch the app.
2. Use the app for ~5 minutes — open chats, switch tabs, exercise basic flows.
**Expected:** App forces native Wayland (no XWayland), continues to render and respond. Previously broken paths in PR #228 still hold.
**Diagnostics on failure:** Launcher log (confirm Wayland mode active), `--doctor`, full env dump, screenshot of any crash dialog.
**References:** [PR #228](https://github.com/aaddrick/claude-desktop-debian/pull/228), [PR #232](https://github.com/aaddrick/claude-desktop-debian/pull/232)
**Code anchors:** project `scripts/launcher-common.sh:28-29` (`CLAUDE_USE_WAYLAND=1` opt-out of XWayland), 100-111 (native-Wayland Electron flags: `UseOzonePlatform,WaylandWindowDecorations`, `--ozone-platform=wayland`, `--enable-wayland-ime`, `--wayland-text-input-version=3`, `GDK_BACKEND=wayland`).
## S09 — Quick window patch runs only on KDE (post-#406 gate)
**Severity:** Critical
**Surface:** Patch gate
**Applies to:** All rows (verifies the gate, not the feature)
**Issues:** [PR #406](https://github.com/aaddrick/claude-desktop-debian/pull/406), [#393](https://github.com/aaddrick/claude-desktop-debian/issues/393)
**Steps:**
1. On a KDE row, launch the app. Inspect launcher log for quick-window-patch markers.
2. On a non-KDE row, launch the app. Inspect launcher log — the markers should be absent.
**Expected:** On KDE sessions the quick-window patch is applied (Quick Entry uses the patched code path). On non-KDE sessions the patch is **not** applied, preventing the [#393](https://github.com/aaddrick/claude-desktop-debian/issues/393) regression on GNOME etc.
**Diagnostics on failure:** Launcher log, `XDG_CURRENT_DESKTOP`, the patch-gate code path in `scripts/patches/`.
**References:** [PR #406](https://github.com/aaddrick/claude-desktop-debian/pull/406), [#393](https://github.com/aaddrick/claude-desktop-debian/issues/393)
**Code anchors:** project `scripts/patches/quick-window.sh:32-42` (KDE-gated `blur()` insertion), 115-125 (KDE-gated focus/visibility check replacement); upstream sites the patch rewrites are around `index.js:515374-515471` (Quick Entry popup construction + handlers).
## S10 — Quick Entry popup is transparent (no opaque square frame)
**Severity:** Should
**Surface:** Quick Entry window (KDE Wayland)
**Applies to:** KDE-W
**Issues:** [#370](https://github.com/aaddrick/claude-desktop-debian/issues/370), [#223](https://github.com/aaddrick/claude-desktop-debian/issues/223), [PR #244](https://github.com/aaddrick/claude-desktop-debian/pull/244)
**Steps:**
1. On KDE Plasma Wayland, invoke Quick Entry.
2. Observe the popup background.
**Expected:** Quick Entry popup renders with a transparent background — no opaque square frame visible behind the rounded prompt UI.
**Diagnostics on failure:** Screenshot, KDE compositor settings (`kwriteconfig5 --read kwinrc Compositing/Backend`), launcher log, BrowserWindow construction args.
**References:** [#370](https://github.com/aaddrick/claude-desktop-debian/issues/370) (current open report), [#223](https://github.com/aaddrick/claude-desktop-debian/issues/223) (closed predecessor), [PR #244](https://github.com/aaddrick/claude-desktop-debian/pull/244)
**Code anchors:** build-reference/app-extracted/.vite/build/index.js:515380 (`transparent: !0`), 515383 (`backgroundColor: "#00000000"`), 515381 (`frame: !1`), 515377 (`skipTaskbar: !0`).
## S11 — Quick Entry shortcut fires from any focus on Wayland (mutter XWayland key-grab)
**Severity:** Critical (for GNOME users)
**Surface:** Global shortcut on GNOME mutter
**Applies to:** GNOME, Ubu
**Issues:** [#404](https://github.com/aaddrick/claude-desktop-debian/issues/404), [PR #406](https://github.com/aaddrick/claude-desktop-debian/pull/406)
**Steps:**
1. On GNOME/mutter Wayland, launch the app.
2. Focus another application; press the Quick Entry shortcut.
3. Repeat from another virtual desktop.
**Expected:** Shortcut fires regardless of focused app or workspace.
**Diagnostics on failure:** Launcher log (note `Using X11 backend via XWayland (for global hotkey support)`), `XDG_CURRENT_DESKTOP`, mutter version (`gnome-shell --version`), the active patch set.
**Currently:** Fedora 43 GNOME Wayland reproduces [#404](https://github.com/aaddrick/claude-desktop-debian/issues/404) — mutter doesn't honour the XWayland-side key grab, so the shortcut is focus-bound. On Ubuntu 24.04 GNOME, the [PR #406](https://github.com/aaddrick/claude-desktop-debian/pull/406) KDE-only gate prevents the regressing patch from running, leaving the older (working) code path active — hence `🔧` on Ubu. The unsolved fix path is [S12](#s12----enable-featuresglobalshortcutsportal-launcher-flag-wired-up-for-gnome-wayland).
**References:** [#404](https://github.com/aaddrick/claude-desktop-debian/issues/404), [PR #406](https://github.com/aaddrick/claude-desktop-debian/pull/406)
**Code anchors:** project `scripts/launcher-common.sh:96-99` (XWayland-default `--ozone-platform=x11`); upstream `index.js:499416` (`globalShortcut.register`).
## S12 — `--enable-features=GlobalShortcutsPortal` launcher flag wired up for GNOME Wayland
**Severity:** Critical
**Surface:** Launcher flag wiring
**Applies to:** GNOME, Ubu (any GNOME Wayland)
**Issues:** [#404](https://github.com/aaddrick/claude-desktop-debian/issues/404)
**Steps:**
1. On GNOME Wayland, launch the app.
2. Inspect the Electron command line via `pgrep -af claude-desktop` — look for `--enable-features=GlobalShortcutsPortal`.
3. Test Quick Entry shortcut from unfocused state (see [T06](#t06--quick-entry-global-shortcut-unfocused)).
**Expected:** Launcher detects GNOME Wayland and appends `--enable-features=GlobalShortcutsPortal` to Electron's argv, routing global shortcuts through XDG Desktop Portal instead of X11 key grabs. Once wired, [#404](https://github.com/aaddrick/claude-desktop-debian/issues/404) is closeable.
**Diagnostics on failure:** Full process argv (`cat /proc/$(pgrep -f electron)/cmdline | tr '\0' ' '`), launcher log, `XDG_CURRENT_DESKTOP`.
**Currently:** Not yet implemented. Tracking under [#404](https://github.com/aaddrick/claude-desktop-debian/issues/404).
> **⚠ Missing in build 1.5354.0** — `--enable-features=GlobalShortcutsPortal` is not appended by `scripts/launcher-common.sh` for any GNOME Wayland variant. Re-verify after next upstream bump and after #404 lands.
**References:** [#404](https://github.com/aaddrick/claude-desktop-debian/issues/404)
**Code anchors:** project `scripts/launcher-common.sh:59-112` (`build_electron_args` — no `GlobalShortcutsPortal` branch present).
## S14 — Global shortcuts via XDG portal work on Niri
**Severity:** Critical (for Niri users)
**Surface:** XDG Desktop Portal `BindShortcuts`
**Applies to:** Niri
**Issues:**
**Steps:**
1. On Niri, launch the app (the launcher special-cases Niri to native Wayland + portal).
2. Configure the Quick Entry shortcut.
3. Observe portal interaction in launcher log.
**Expected:** `BindShortcuts` succeeds. Configured Quick Entry shortcut is registered and fires.
**Diagnostics on failure:** Launcher log capture of the `BindShortcuts` call, `busctl --user tree org.freedesktop.portal.Desktop`, Niri version, full env.
**Currently:** `Failed to call BindShortcuts (error code 5)` — portal global shortcuts fail on Niri. Different root cause from [#404](https://github.com/aaddrick/claude-desktop-debian/issues/404), same user-visible symptom (Quick Entry shortcut doesn't fire). Not yet filed.
**References:**
**Code anchors:** project `scripts/launcher-common.sh:41-44` (Niri force-native-Wayland branch); upstream `index.js:499416` (`globalShortcut.register`, which on native Wayland routes through Electron's `xdg-desktop-portal` `BindShortcuts` path inside Chromium).
## S29 — Quick Entry popup is created lazily on first shortcut press (closed-to-tray sanity)
**Severity:** Critical
**Surface:** Quick Entry popup lifecycle
**Applies to:** All rows
**Issues:** [#393](https://github.com/aaddrick/claude-desktop-debian/issues/393)
**Steps:**
1. Launch app, wait for main window to appear, hide-to-tray (close via X — see [T08](./tray-and-window-chrome.md#t08--hide-to-tray-on-close)).
2. Confirm no Claude window is mapped (e.g. `wmctrl -l | grep -i claude` returns empty on X11; `swaymsg -t get_tree` for Wayland equivalents).
3. Press the Quick Entry shortcut.
4. Type `hello`, press Enter.
**Expected:** Popup appears even though no Claude window was mapped before the keypress. Upstream constructs the popup `BrowserWindow` lazily on first shortcut invocation (`if (!Ko || ...) Ko = new BrowserWindow(...)` near `index.js:515375`), so the popup does not need a pre-existing main window. New chat session is created and reachable on submit.
**Diagnostics on failure:** Launcher log, `~/.config/Claude/logs/`, `XDG_CURRENT_DESKTOP`, screenshot of empty desktop after shortcut press.
**References:** [#393](https://github.com/aaddrick/claude-desktop-debian/issues/393), upstream `index.js:515375-515397`
**Code anchors:** build-reference/app-extracted/.vite/build/index.js:515374 (`if (!Ko ...) Ko = new BrowserWindow(...)` lazy construction guard), 515394 (`preload: ".vite/build/quickWindow.js"`), 515438 (`Ko.loadFile(".vite/renderer/quick_window/quick-window.html")`).
## S30 — Quick Entry shortcut becomes a no-op after full app exit
**Severity:** Should
**Surface:** Global shortcut unregistration
**Applies to:** All rows
**Issues:**
**Steps:**
1. Launch app. Confirm Quick Entry shortcut works (popup opens).
2. Quit Claude Desktop fully via tray → Quit (or `pkill -f app.asar`). Confirm no `electron` processes for the app remain.
3. Press the Quick Entry shortcut.
**Expected:** No popup appears. No error dialog. No zombie process. Electron unregisters the global shortcut on app exit; the shortcut becomes a system-level no-op.
**Diagnostics on failure:** `pgrep -af app.asar` output, `journalctl --user -e -n 100`, OS-level shortcut bindings (`gsettings list-recursively | grep -i shortcut`).
**References:** upstream `index.js:499416` (registration site)
**Code anchors:** build-reference/app-extracted/.vite/build/index.js:499398-499428 (`nG()` register/unregister wrapper — passing `null` accelerator unregisters), 499416 (`hA.globalShortcut.register`), 499403 (`hA.globalShortcut.unregister`).
## S31 — Quick Entry submit makes the new chat reachable from any main-window state
**Severity:** Critical
**Surface:** Submit → main window show
**Applies to:** All rows
**Issues:** [#393](https://github.com/aaddrick/claude-desktop-debian/issues/393), [PR #406](https://github.com/aaddrick/claude-desktop-debian/pull/406)
**Steps:**
1. For each main-window state: (a) visible-and-focused, (b) minimized, (c) hidden-to-tray, (d) on a different workspace, (e) closed via X (project's hide-to-tray override).
2. Set the state, then invoke Quick Entry, type `hello`, submit.
3. Record what happens to the main window: auto-restored, requires tray click, came to current workspace, stayed on its own workspace.
**Expected:** The new chat session is **reachable** from each starting state. Acceptance is "user can reach the new chat" — not "main window auto-restored." Upstream calls `mainWin.show()` + `mainWin.focus()` only (`index.js:515566, 515599`), with no `restore()`, no `setVisibleOnAllWorkspaces()`, no `moveTop()`. Whether `show()` un-minimizes or migrates workspaces is purely compositor-dependent. The failure case is "new chat created but the user has no way to surface it" — that's a regression. Anything that reaches the chat (even via a tray click) is upstream-acceptable.
**Diagnostics on failure:** `~/.config/Claude/logs/`, screenshot at each state, output of `wmctrl -l` (X11) or `swaymsg -t get_tree` (sway), launcher log.
**Currently:** On non-KDE rows, the post-#406 KDE-only patch gate leaves the upstream code path (`isFocused()` short-circuit) active. Andrej730's #393 GNOME repro shows the stale-`isFocused()` bug can still suppress `show()` in tray-only state. See [S32](#s32--quick-entry-submit-on-gnome-mutter-doesnt-trip-electron-stale-isfocused).
**References:** [#393](https://github.com/aaddrick/claude-desktop-debian/issues/393), upstream `index.js:515566, 515599, 105164-171`
**Code anchors:** build-reference/app-extracted/.vite/build/index.js:515567 (`h1() || ut.show(), ut.focus()` in `gHn()` existing-chat path), 515598-515599 (`h1() || ut.show(), ut.focus()` in `ynt()` new-chat path), 105164-105171 (`h1()` returns `ut.isFocused() || mainView.webContents.isFocused()`).
## S32 — Quick Entry submit on GNOME mutter doesn't trip Electron stale-`isFocused()`
**Severity:** Critical (for GNOME users)
**Surface:** Electron `BrowserWindow.isFocused()` on Linux
**Applies to:** GNOME, Ubu
**Issues:** [#393](https://github.com/aaddrick/claude-desktop-debian/issues/393)
**Steps:**
1. On GNOME Wayland, launch the app, then close to tray.
2. Confirm the app is in tray-only state (no window mapped, no Dash entry, no taskbar entry).
3. Invoke Quick Entry, type `hello`, submit.
4. Repeat after re-pinning the app to the Dash and reproducing the tray-only state from there.
**Expected:** Submit produces a reachable new chat session in both Dash-pinned and not-pinned cases. **The Dash distinction is empirical, not code-driven** — upstream has no notion of Dash presence. The underlying failure mode is Electron's `BrowserWindow.isFocused()` returning stale-true on Linux mutter, which causes upstream's `h1() || ut.show()` short-circuit (`index.js:515566`) to skip `show()`. Andrej730 traced this on #393.
**Diagnostics on failure:** Bundled `index.js` h1() body (extract via `npx asar extract`); add temporary logging in `h1()` per Andrej730's diff in #393 if reproducing locally; `gnome-shell --version`; `~/.config/Claude/logs/`.
**Currently:** Open. The KDE-only gate from PR #406 leaves this path unfixed on GNOME. Resolution requires either (a) widening the patch to all DEs by dropping the `isFocused()` fallback in the patched code, or (b) waiting for an upstream Electron fix to `isFocused()` on Linux.
**References:** [#393](https://github.com/aaddrick/claude-desktop-debian/issues/393) (Andrej730's diagnosis with `eU()` logging output)
**Code anchors:** build-reference/app-extracted/.vite/build/index.js:105164-105171 (`h1()` body — the exact short-circuit Andrej730 instrumented), 515567 + 515598 (the two `h1() || ut.show()` call sites the suppression hits).
## S33 — Quick Entry transparent rendering tracked against bundled Electron version
**Severity:** Should
**Surface:** Bundled Electron version
**Applies to:** All rows (relevant where #370 reproduces)
**Issues:** [#370](https://github.com/aaddrick/claude-desktop-debian/issues/370)
**Steps:**
1. After install, capture the Electron version bundled with the app: extract `app.asar.unpacked` and run the bundled Electron with `--version`, or read it from the bundled binary's metadata.
2. Record the version in [`../matrix.md`](../matrix.md) per row, alongside the [S10](#s10--quick-entry-popup-is-transparent-no-opaque-square-frame) status.
**Expected:** Captured version is recorded. If the version is **41.0.4 through 41.x.y** and S10 fails, the upstream electron/electron#50213 regression hypothesis (per @noctuum's bisect on #370) holds and the issue is blocked on upstream. If the version is **41.0.3 or earlier** and S10 fails, the bisect is wrong — investigate. If the version is **a later release that includes a CSD-rendering fix** and S10 still fails, the upstream-regression hypothesis is also wrong.
**Diagnostics on failure:** Output of the version capture command, link to electron/electron#50213, the BrowserWindow construction args from the bundled `index.js`.
**Currently:** Per @noctuum's bisect, 41.0.4 introduced the regression. No upstream fix shipped as of last check.
**References:** [#370](https://github.com/aaddrick/claude-desktop-debian/issues/370), upstream `index.js:515380, 515383` (already sets `transparent: true` and `backgroundColor: "#00000000"`)
**Code anchors:** build-reference/app-extracted/.vite/build/index.js:515380 (`transparent: !0`), 515383 (`backgroundColor: "#00000000"`), 515374-515397 (popup `BrowserWindow` construction args block, including `frame: !1`, `hasShadow: Zr`, `type: Zr ? "panel" : void 0`).
## S34 — Quick Entry shortcut focuses fullscreen main window instead of showing popup
**Severity:** Should
**Surface:** Shortcut behavior on fullscreen main
**Applies to:** All rows
**Issues:**
**Steps:**
1. Launch app. Put the main window into native fullscreen (F11 or platform equivalent).
2. Press the Quick Entry shortcut.
**Expected:** Popup does **not** appear. Main window receives focus and `ide()` runs (upstream behavior at `index.js:525287-525290`). This is intentional upstream UX — assumes the user wants to interact with the existing fullscreen Claude rather than overlay a popup on it.
**Diagnostics on failure:** Screenshot, launcher log, confirm fullscreen state via `wmctrl -l -G` / Wayland equivalent.
**References:** upstream `index.js:525287-525290`
**Code anchors:** build-reference/app-extracted/.vite/build/index.js:525287-525290 (Quick Entry callback: `ut && !ut.isDestroyed() && ut.isFullScreen() ? (ut.focus(), ide()) : Yri()`), 515234-515241 (`ide()``show()` + `focus()` + `webContents.send(TEe.cmdK)` for the cmd-K dispatch).
## S35 — Quick Entry popup position is persisted across invocations and across app restarts
**Severity:** Should
**Surface:** Popup placement memory
**Applies to:** All rows
**Issues:**
**Steps:**
1. Launch app. Invoke Quick Entry. Note the popup position (record monitor + coordinates if possible — e.g. `xdotool getactivewindow getwindowgeometry` on X11).
2. Dismiss (Esc). Re-invoke. Position should be unchanged across this dismiss/re-invoke cycle.
3. Quit Claude Desktop fully (`pkill -f app.asar`). Re-launch. Invoke Quick Entry.
4. Confirm position matches the pre-restart capture.
**Expected:** Popup reappears at the same monitor + position before and after a full app restart. Upstream persists position via `an.get("quickWindowPosition")` (`index.js:515491-515526`), keyed on monitor label + resolution.
**Diagnostics on failure:** Captured coordinates pre/post-restart, content of any persisted settings file (project's settings storage location varies by OS).
**References:** upstream `index.js:515491-515526`
**Code anchors:** build-reference/app-extracted/.vite/build/index.js:515444-515461 (`Ko.on("hide", …)` persists `quickWindowPosition` via `an.set(...)`), 515491-515521 (`aHn()` resolves saved monitor by `label + bounds.width + bounds.height`, falling back to label-only or proportional placement), 515489 (`Ko.setPosition(...)` after show).
## S36 — Quick Entry popup falls back to primary display when saved monitor is gone
**Severity:** Smoke
**Surface:** Multi-monitor placement
**Applies to:** All rows with a multi-monitor capable host
**Issues:**
**Steps:**
1. **Multi-monitor required.** With an external monitor connected, invoke Quick Entry on the external monitor. Trigger position persistence (per [S35](#s35--quick-entry-popup-position-is-persisted-across-invocations-and-across-app-restarts)).
2. Disconnect the external monitor (libvirt: detach the second display device; bare metal: unplug).
3. Invoke Quick Entry.
**Expected:** Popup appears on the primary display, not at off-screen coordinates. Upstream falls back to `cHn()` when the saved monitor is no longer present (`index.js:515502`).
**Diagnostics on failure:** `xrandr` (X11) / `wlr-randr` (wlroots) output before and after disconnect, captured popup coordinates, screenshot.
**Skip when:** Single-monitor VM or host. Not part of the [§ Mandatory matrix](../quick-entry-closeout.md#mandatory-matrix); skip with `-` in the dashboard.
**References:** upstream `index.js:515502`
**Code anchors:** build-reference/app-extracted/.vite/build/index.js:515502 (`return cHn();` early-return when no saved position), 515523-515527 (`cHn()` centres popup on `screen.getPrimaryDisplay()` workArea), 515514-515515 (`label`-only match fallback before primary-display fallback).
## S37 — Quick Entry popup remains functional after main window destroy
**Severity:** Should
**Surface:** Popup lifecycle independence from main window
**Applies to:** All rows (where reachable)
**Issues:**
**Steps:**
1. Launch app, focus main window.
2. **Trigger main window destroy without quitting the app.** On this project, the X-button hide-to-tray override means the standard close path does **not** destroy `ut`. Reach the destroy path via one of:
- DevTools console on the main window: `require('electron').remote.getCurrentWindow().destroy()` (if `remote` is exposed; not guaranteed).
- A debug build with the hide-to-tray override removed.
- Skip and mark `-` if unreachable.
3. After destroy: invoke Quick Entry, type `hello`, submit.
**Expected:** Popup appears and accepts input. Upstream's `!ut || ut.isDestroyed()` guard at `index.js:515595` skips the show/focus block without crashing. The new chat is created in the data layer; whether it has a window to surface in is a separate question (upstream contract is "popup itself does not crash").
**Diagnostics on failure:** Crash dump, `~/.config/Claude/logs/`, sequence of actions taken to reach the destroy path.
**Currently:** Likely unreachable on Linux without a debug build, due to project's hide-to-tray override of the X button. Mark `-` (N/A) on rows where the destroy path can't be triggered.
**References:** upstream `index.js:515595`
**Code anchors:** build-reference/app-extracted/.vite/build/index.js:515595-515602 (`setTimeout(() => { !ut || ut.isDestroyed() || (h1() || ut.show(), ut.focus(), Qe == null || Qe.webContents.focus(), iri()); }, 0)` — guard skips show/focus block on destroy without throwing); 515547 (companion guard in `nde()` chat-id submit path: `else if (ut && !ut.isDestroyed())`).

View File

@@ -0,0 +1,123 @@
# Tray & Window Chrome
Tests covering the tray icon, OS-native window decorations, the hybrid in-app topbar (PR #538), and hide-to-tray on close. See [`../matrix.md`](../matrix.md) for status.
## T03 — Tray icon present
**Severity:** Smoke
**Surface:** System tray / SNI
**Applies to:** All rows
**Issues:**
**Runner:** [`tools/test-harness/src/runners/T03_tray_icon_present.spec.ts`](../../../tools/test-harness/src/runners/T03_tray_icon_present.spec.ts) — registration only (left-click toggle + theme-switch in-place rebuild are v2)
**Steps:**
1. Launch the app. Wait a few seconds.
2. Locate the tray icon in the system tray / status area.
3. Right-click → confirm standard menu (Show, Quit, etc.). Left-click → confirm window toggles.
4. Switch the system theme between light and dark; observe the tray icon update.
**Expected:** Tray icon appears within a few seconds of app launch. Right-click exposes the standard menu. Left-click toggles main window visibility. Theme changes update the icon in place without spawning a duplicate.
**Diagnostics on failure:** `RegisteredStatusNotifierItems` from the SNI watcher (see [runbook](../runbook.md#tray--dbus-state-kde)), the tray daemon process for the DE (Plasma's `plasmashell`, GNOME's `gnome-shell` + AppIndicator extension state, etc.), launcher log.
**References:** [`docs/learnings/tray-rebuild-race.md`](../../learnings/tray-rebuild-race.md)
**Code anchors:** `build-reference/app-extracted/.vite/build/index.js:525627` (`vy.on("menuBarEnabled", () => { Sde() })` — re-entry), `index.js:525631-525673` (`function Sde()` — tray construction), `index.js:525645` (`new hA.Tray(hA.nativeImage.createFromPath(t))`), `index.js:525646` (`qh.on("click", () => void Yri())` — left-click handler), `index.js:525653` (`qh.setContextMenu(mnt())` — Linux right-click via context menu), `index.js:515150-515169` (`function mnt()` — Show App + Quit menu items), `index.js:525623` (`hA.nativeTheme.on("updated", ...)` — theme-change re-entry).
## T04 — Window decorations draw
**Severity:** Smoke
**Surface:** Window chrome
**Applies to:** All rows
**Issues:** [PR #127](https://github.com/aaddrick/claude-desktop-debian/pull/127), [PR #538](https://github.com/aaddrick/claude-desktop-debian/pull/538)
**Runner:** [`tools/test-harness/src/runners/T04_window_decorations.spec.ts`](../../../tools/test-harness/src/runners/T04_window_decorations.spec.ts) — X11 / XWayland only (checks `_NET_FRAME_EXTENTS`); native-Wayland window-state queries are deferred
**Steps:**
1. Launch the app.
2. Confirm window has a working OS-native frame: close, minimize, maximize render and respond.
3. Resize via window edges.
**Expected:** Frame is drawn by the DE/compositor (not the app). All controls render and respond. Resize works.
**Diagnostics on failure:** `xprop _NET_WM_WINDOW_TYPE` (X11) / `swaymsg -t get_tree` or compositor-equivalent (Wayland), launcher log line for `frame:` setting, screenshot.
**References:** [PR #127](https://github.com/aaddrick/claude-desktop-debian/pull/127), [PR #538](https://github.com/aaddrick/claude-desktop-debian/pull/538) (hybrid mode keeps native frame), [`docs/learnings/linux-topbar-shim.md`](../../learnings/linux-topbar-shim.md)
**Code anchors:** Upstream factory passes `titleBarStyle: "hidden"` and `titleBarOverlay: ys` (Windows-only flag) to `BrowserWindow` at `build-reference/app-extracted/.vite/build/index.js:524892-524909` (`Ori()`). On Linux the wrapper at `scripts/frame-fix-wrapper.js:122` overrides to `options.frame = true` and at `scripts/frame-fix-wrapper.js:129-130` deletes the macOS-only `titleBarStyle` / `titleBarOverlay` so the DE draws the frame. (Hybrid-mode plumbing — `CLAUDE_TITLEBAR_STYLE` resolution and the `native`/`hybrid`/`hidden` branches — lives on `main` per PR #538; the docs/compat-matrix branch's `frame-fix-wrapper.js` carries only the unconditional `frame:true` patch, which is sufficient for T04's "frame draws" assertion.)
## T07 — In-app topbar renders + clickable
**Severity:** Smoke
**Surface:** In-app topbar (hybrid mode)
**Applies to:** All rows on PR #538 builds
**Issues:** [PR #538](https://github.com/aaddrick/claude-desktop-debian/pull/538), [PR #127](https://github.com/aaddrick/claude-desktop-debian/pull/127)
**Steps:**
1. Launch a PR #538 build.
2. Observe the in-app topbar below the OS frame.
3. Click each of: hamburger menu, sidebar toggle, search, back, forward, Cowork ghost.
**Expected:** All five topbar buttons render below the native frame. Each responds to mouse clicks (no implicit drag region capturing the events). If any single button fails to render or click, the test is `✗` — note which one in the linked issue.
**Diagnostics on failure:** Screenshot, env (`OZONE_PLATFORM`, `ELECTRON_OZONE_PLATFORM_HINT`, `GDK_BACKEND`, `QT_QPA_PLATFORM`, `MOZ_ENABLE_WAYLAND`, `SDL_VIDEODRIVER`), launcher log, DevTools `document.querySelector('.topbar')` HTML if accessible.
**References:** [PR #538](https://github.com/aaddrick/claude-desktop-debian/pull/538), [PR #127](https://github.com/aaddrick/claude-desktop-debian/pull/127), [`docs/learnings/linux-topbar-shim.md`](../../learnings/linux-topbar-shim.md)
**Code anchors:** UA-spoof shim source `scripts/wco-shim.js` (lines 1-30 module guard / `CLAUDE_TITLEBAR_STYLE != 'native'` gate, lines 184-191 `navigator.userAgent` redefinition matching `/(win32|win64|windows|wince)/i`, lines 52-53 `CONTROLS_WIDTH=140` / `TITLEBAR_HEIGHT=40`); injection orchestrator `scripts/patches/wco-shim.sh` (`patch_wco_shim()` prepends shim source to `mainView.js`); hybrid-mode wrapper branch `scripts/frame-fix-wrapper.js:62-70` (`VALID_TITLEBAR_STYLES`, default `hybrid`) and `:152-240` (per-mode `frame` / `titleBarStyle` handling).
## T08 — Hide-to-tray on close
**Severity:** Smoke
**Surface:** Window lifecycle
**Applies to:** All rows
**Issues:** [PR #451](https://github.com/aaddrick/claude-desktop-debian/pull/451)
**Steps:**
1. Launch the app. Click the window close (X) button.
2. Confirm app process is still running (`pgrep -af claude-desktop`).
3. Click the tray icon (or invoke Quick Entry) → window restores.
4. Quit explicitly via tray menu or `Ctrl+Q`.
**Expected:** Close button hides main window to tray, doesn't quit. App keeps running. Tray-click restores. Explicit Quit ends the process.
**Diagnostics on failure:** `pgrep -af claude-desktop` after close, launcher log, screenshot of any dialog.
**References:** [PR #451](https://github.com/aaddrick/claude-desktop-debian/pull/451)
**Code anchors:** Upstream Linux quit-on-last-close at `build-reference/app-extracted/.vite/build/index.js:525550-525552` (`hA.app.on("window-all-closed", () => { Zr || Ap() })``Zr` is darwin). Wrapper interception at `scripts/frame-fix-wrapper.js:178-185` (`this.on('close', e => { if (!result.app._quittingIntentionally && !this.isDestroyed()) { e.preventDefault(); this.hide() } })`) and `scripts/frame-fix-wrapper.js:370-374` (`app.on('before-quit', () => { app._quittingIntentionally = true })` — arms the bypass for tray-Quit / `Ctrl+Q` / SIGTERM). `CLOSE_TO_TRAY` gate (Linux + `CLAUDE_QUIT_ON_CLOSE !== '1'`) at `scripts/frame-fix-wrapper.js:49-51`. Tray Quit menu item `mnt()` `click: rde` at `index.js:515166`; `function rde()` at `index.js:515306-515308` calls `Ap(!1)`.
## S08 — Tray icon doesn't duplicate after `nativeTheme` update
**Severity:** Should
**Surface:** Tray (KDE)
**Applies to:** KDE-W, KDE-X
**Issues:** [`docs/learnings/tray-rebuild-race.md`](../../learnings/tray-rebuild-race.md)
**Steps:**
1. Launch the app on KDE.
2. Toggle system theme (light ↔ dark).
3. Observe the tray for ~10 seconds.
**Expected:** Tray icon updates in place via `setImage` + `setContextMenu`. SNI service stays registered — no de-register / re-register churn that would leave a duplicate icon visible until KDE garbage-collects.
**Diagnostics on failure:** SNI watcher state before/after theme switch (see [runbook](../runbook.md#tray--dbus-state-kde)), launcher log, `journalctl --user -u plasma-plasmashell -n 50`.
**References:** [`docs/learnings/tray-rebuild-race.md`](../../learnings/tray-rebuild-race.md). Mitigated upstream — the in-place fast-path is the current behavior.
**Code anchors:** Upstream destroy+recreate slow-path at `build-reference/app-extracted/.vite/build/index.js:525643` (`qh && (qh.destroy(), (qh = null))`) followed immediately by `new hA.Tray(...)` at `:525645` and `setContextMenu(mnt())` at `:525653` — the SNI re-register that races on KDE. Fast-path injection in `scripts/patches/tray.sh` `patch_tray_inplace_update()` (lines 95-231): extracts `tray_var` / `menu_func` / `path_var` / `enabled_var` dynamically, then injects `if (TRAY && ENABLED !== false) { TRAY.setImage(EL.nativeImage.createFromPath(PATH)); process.platform !== "darwin" && TRAY.setContextMenu(MENU()); return }` before the destroy block. Idempotency marker at `tray.sh:174-180` keys on the post-rename `setImage(...nativeImage.createFromPath(PATH_VAR))` literal. Mutex + 250 ms DBus settle delay (the prior mitigation, kept for the legitimate slow-path entries) at `tray.sh:48-60`.
## S13 — Hybrid topbar shim survives Omarchy's Ozone-Wayland env exports
**Severity:** Critical (for Omarchy users)
**Surface:** In-app topbar (hybrid mode) under Omarchy env
**Applies to:** Hypr-O
**Issues:** [PR #538](https://github.com/aaddrick/claude-desktop-debian/pull/538)
**Steps:**
1. On OmarchyOS, export Omarchy's session-wide env (`ELECTRON_OZONE_PLATFORM_HINT=wayland`, `OZONE_PLATFORM=wayland`, `GDK_BACKEND=wayland,x11,*`, `QT_QPA_PLATFORM=wayland;xcb`, `MOZ_ENABLE_WAYLAND=1`, `SDL_VIDEODRIVER=wayland,x11`).
2. Launch a PR #538 build.
3. Click each of the five topbar buttons.
**Expected:** The hybrid-mode topbar shim (`scripts/wco-shim.js`) loads in time to spoof the UA before claude.ai's `isWindows()` check fires. All five topbar buttons render and click.
**Diagnostics on failure:** Full session env, launcher log, `--doctor`, screenshot, video (per @lukedev45's bug report on PR #538), DevTools console for shim-load errors.
**Currently:** Reproduces partial render on OmarchyOS Hyprland per [@lukedev45](https://github.com/lukedev45)'s video on [PR #538](https://github.com/aaddrick/claude-desktop-debian/pull/538). @aaddrick attempted local repro on KDE Plasma + Wayland with the same env vars and could not reproduce; root cause TBD pending diagnostic capture from a broken run.
**References:** [PR #538](https://github.com/aaddrick/claude-desktop-debian/pull/538), [`docs/learnings/linux-topbar-shim.md`](../../learnings/linux-topbar-shim.md)
**Code anchors:** Shim is inlined at the top of `mainView.js` (the BrowserView preload), not loaded via `require` — see the rationale at `scripts/patches/wco-shim.sh:23-40` ("Sandboxed preloads can only require a fixed allowlist of modules…"). The injection prepends `scripts/wco-shim.js` source at the start of `app.asar.contents/.vite/build/mainView.js` so the UA override fires before the bundle's `isWindows()` regex (`/(win32|win64|windows|wince)/i`) ever runs in the page main world (`scripts/wco-shim.js:184-191`). The shim's IIFE no-ops on non-Linux at `wco-shim.js:29` and on `CLAUDE_TITLEBAR_STYLE === 'native'` at `wco-shim.js:30-32`, so the only env-export interaction with `OZONE_PLATFORM` etc. is via Chromium's own platform plumbing — none of those exports are read by the shim itself, which makes the partial-render repro on Omarchy mysterious to static analysis.

View File

@@ -0,0 +1,322 @@
# lib/claudeai.ts AX-tree migration — implementation prompt
This file is meant to be **copied verbatim into a fresh Claude Code
session** as the initial user message. Don't paraphrase it; the
self-correction loop depends on the exact directives below.
---
## Prompt to paste
You're picking up after the v7 fingerprint walker + U01 wire-up
landed. Walker, resolver, and U01 are all on the AX-tree substrate.
The page-object library `tools/test-harness/src/lib/claudeai.ts` is
still on the old substrate — `document.querySelector` against
minified-tailwind class shapes (`button[aria-haspopup="menu"]` +
`span.truncate.max-w-[Npx]`) — and that's where every claude.ai UI
spec couples to upstream's React DOM. Your job is to migrate the
brittle CSS-shape walks in `claudeai.ts` to AX-tree resolution using
the v7 walker primitives, run the H/S spec families that consume
them, and iterate until those specs pass without DOM-shape coupling.
### Authoritative reference
Read these in order. They contain the design, the gotchas, and the
runtime contract — the prompt below assumes them as background.
- `docs/testing/fingerprint-v7-plan.md` — design contract for the v7
fingerprint, kind-strictness matrix, resolver fallback chain. Skim
the "Capture algorithm" and "Resolver / fallback chain" sections;
the migration consumes the same primitives.
- `docs/learnings/test-harness-ax-tree-walker.md` — the five
non-obvious AX-tree traps (AX-enable async lag, navigateTo no-op,
flat dialog>button[] lists, more-options shape, sidebar
virtualization). All apply here too — `lib/claudeai.ts` calls run
inside the same renderer the walker drives.
- `tools/test-harness/src/lib/claudeai.ts` — the migration target.
~340 lines, eight functions plus two classes (`CodeTab`,
`LocalEnvPill`). Every public function is a discovery walk against
`evalInRenderer` with `document.querySelectorAll`.
### Why this iteration
Per the v7 plan's design goal §2 "Resilient to cosmetic drift" —
upstream regenerates tailwind class signatures on rebuild
(`max-w-[Npx]`, `df-pill`-style atoms), so `claudeai.ts`'s CSS-shape
walks break on any minor UI rebuild even when the AX-computed role
and accessible name are stable. The U01 wire-up confirmed the AX
tree is a usable substrate end-to-end (~7s/test, 89/90 stable across
two consecutive sweeps). Pulling `claudeai.ts` onto the same
substrate eliminates the recurring "tailwind regen breaks H05/S31
again" failure mode.
Acceptance per the plan: H05 + S29-S37 + T-prefix specs that consume
`claudeai.ts` keep passing on the same account, with zero new
flakes. Migration is mechanical (replace the eval-string walks with
AX-tree queries) and the existing tests are the contract.
### Repo conventions
- Tabs for indentation, lines under 80 chars, single quotes for
literals, TypeScript strict mode (`tools/test-harness/tsconfig.json`
enforces it).
- Comments only when the WHY is non-obvious — write the `because:`
clause, not the `that:` clause.
- No backward-compatibility shims. If a function's signature needs
to change, change every caller. Don't keep both code paths.
- Don't commit. The user reviews and commits.
### Code anchors
- `tools/test-harness/explore/walker.ts` — exports the primitives
you'll consume:
- `findByFingerprint(inspector, fingerprint, kind)` — full
resolver with strictness gating + relaxed-scope fallback.
Overkill for one-shot lookups against the live renderer.
- `queryAccessibleTree(elements, query)` — pure filter, used at
capture and resolve time. Takes a `RawElement[]` snapshot and
an `AxQuery` (ariaPath + leaf criteria). What you'll likely
wrap.
- `axTreeToSnapshot(nodes)` — converts CDP `AxNode[]` to the
walker's `RawElement[]` shape. Drops ignored nodes.
- `walkLandmarkAncestors(raw)` — emits the AriaStep[] for an
element. Useful if a method needs to disambiguate by landmark.
- `waitForAxTreeStable(inspector, opts)` — gating primitive used
by walker + U01. Use `{ minNodes: 1, timeoutMs: 10000 }` for
post-click reads (matches `snapshotSurface`'s default).
- `tools/test-harness/src/lib/inspector.ts``getAccessibleTree`
fetches the raw CDP tree filtered to the claude.ai webContents.
- `tools/test-harness/src/lib/claudeai.ts` — the migration target.
Read the file-header comment first; it documents the discovery
strategy you're replacing.
- `tools/test-harness/src/runners/H05_ui_drift_check.spec.ts`,
`S31_quick_entry_submit_reaches_new_chat.spec.ts`,
`S32_quick_entry_submit_gnome_stale_isfocused.spec.ts` — primary
consumers of the methods being migrated.
### Phases
#### Phase A — spike on one method
1. `cd tools/test-harness && npm run typecheck` — must pass before
doing anything.
2. Pick `openPill(inspector, labelPattern, opts)` as the spike.
It's the most CSS-shape-coupled method and exercises the
menu-render polling pattern the rest of `claudeai.ts` reuses.
3. Replace its body with an AX-tree query:
- Fetch the AX tree (`inspector.getAccessibleTree('claude.ai')`),
convert via `axTreeToSnapshot`.
- Filter to elements with `computedRole === 'button'` and
accessibleName matching `labelPattern`.
- For each candidate, compute its parent landmark via
`walkLandmarkAncestors`. The compact-pill discriminator —
"has a `span.truncate.max-w-[Npx]` child" — needs an AX
analogue. Most likely: parent is `toolbar` / `group` and the
element has `aria-haspopup === 'menu'` (exposed in AX as
`hasPopup` property; check whether `RawElement` carries it
and extend if needed).
- Click via `inspector.clickByBackendNodeId(raw.backendDOMNodeId)`.
- Poll for menu items via AX role match (`menuitem`,
`menuitemradio`, `menuitemcheckbox`).
4. Run H05 against your branch (`./node_modules/.bin/playwright
test src/runners/H05_ui_drift_check.spec.ts`). H05 doesn't
directly call `openPill` but exercises the same renderer state;
if H05 regresses your AX walk is wrong.
5. Run S31 (`./node_modules/.bin/playwright test
src/runners/S31_quick_entry_submit_reaches_new_chat.spec.ts`).
This calls `openPill` indirectly via `CodeTab.activate` →
`findCompactPills`.
6. If both pass, the AX substrate works for at least one method.
Commit the shape mentally (don't `git commit` — the user does
that). If either fails, the spike is in trouble; re-read the
AX-tree learnings doc for traps you missed and fix the
primitive before expanding.
#### Phase B — migrate the rest
For each remaining function in `claudeai.ts`, port the discovery
walk to AX:
- `activateTab(inspector, name)` — `button` with
`accessibleName === name` under root or banner landmark. Existing
`aria-label="X"` selector → AX `name` literal match.
- `findCompactPills(inspector)` — list of buttons with
`hasPopup === 'menu'` AND inner `span.truncate.max-w-[…]` text
child. AX equivalent: button role + hasPopup + a child
`genericContainer` (or whatever AX exposes for `<span>`) carrying
the visible text. Returns `{text, maxW, expanded}` today —
`maxW` is a tailwind artifact and should be dropped from the AX
shape (callers don't use it for matching, just for diagnostics;
keep a placeholder or remove from the type).
- `clickMenuItem(inspector, textPattern, opts)` — element with
role in `{menuitem, menuitemradio, menuitemcheckbox}` and
accessibleName matching `textPattern`. The CSS attribute selector
has an AX direct equivalent.
- `pressEscape(inspector)` — keep as-is. It's a keydown dispatch,
not a discovery walk.
- `CodeTab.activate(opts)` — calls `activateTab` + polls
`findCompactPills`. Migrates by transitivity.
- `LocalEnvPill` — read its body to enumerate callers.
After each migration:
1. `npm run typecheck` — must pass.
2. `npx tsx explore/walker.ts` — selfTest must pass (you may have
touched walker.ts to expose new primitives).
3. Run the affected spec(s).
#### Phase C — full sweep
1. Run all H/S/T runners that consume `claudeai.ts`:
- H05 (UI drift)
- S31 (Code-tab submit)
- S32 (GNOME stale isFocused)
- any T-prefix that uses `installOpenDialogMock` or `pressEscape`
2. Tally pass/fail. The post-migration baseline must equal the
pre-migration baseline, modulo flakes characterized in
`docs/learnings/test-harness-ax-tree-walker.md`.
Cap iterations at **5 sweep cycles** total (spike + 4 fix-rerun
cycles) — past that, stop and report.
##### Failure classes
1. **AX-shape mismatch.** Element has the CSS shape the old code
relied on but a different AX role/name than expected. Fix:
probe the AX tree for the actual shape (use
`inspector.getAccessibleTree('claude.ai')` interactively from a
one-shot script), update the AX query.
2. **Missing AX property exposure.** `hasPopup`, `expanded`, etc.
may not be in `RawElement` today (the walker only reads role,
name, ancestors, sibling info). Extend `RawElement` and
`axTreeToSnapshot` to expose what the migration needs. Update
walker.ts selfTest if you change the snapshot shape.
3. **Race against menu render.** Old code polled
`document.querySelectorAll('[role=menuitem]')` every 50ms. AX
tree updates lag DOM by hundreds of ms; bake a
`waitForAxTreeStable({ minNodes: 1 })` between click and
menuitem fetch instead of a short DOM poll.
4. **Tailwind-class diagnostic loss.** `findCompactPills` returns
`maxW` which callers use only in error messages. If the
AX-only return shape drops `maxW`, error messages get less
informative — accept it, don't reintroduce DOM walks just for
diagnostics. Keep the `maxW` field optional/null in the type.
##### What "fix" means
A fix is one of:
- A code change in `claudeai.ts`, `walker.ts`, or `inspector.ts`.
- A targeted extension of `RawElement` / `axTreeToSnapshot` to
expose an AX property the migration needs.
Not a fix:
- `// eslint-disable-next-line` / `// @ts-ignore` / `as unknown as ...`.
- Keeping the old `document.querySelector` walk as a fallback.
- Adding an AX walk that wraps a CSS walk that wraps an AX walk.
### Self-correction loop (general protocol)
After each phase's specific loop:
1. If `npm run typecheck` reports errors, fix root causes — no
`// @ts-ignore`, no `any`, no `as unknown as ...`.
2. If `npx tsx explore/walker.ts` (selfTest) fails, the change broke
an algorithmic invariant. Don't relax the test; fix the change.
3. **Cap fix attempts per problem class at 3.** After 3 attempts
on the same class without progress, stop and report.
4. Mark Phase complete only when every step in that Phase passes
cleanly.
### Termination conditions
Stop and write a final report when one of:
1. **Migration is clean.** All `claudeai.ts` methods on AX
substrate, all consuming specs pass at the pre-migration
baseline. Report final pass tallies + diff stat.
2. **Hit the 5-sweep cap.** Report what's done, what's blocked,
and what each remaining failure looks like.
3. **Hit the 3-attempt cap on a non-trivial issue.** Report
attempts, why each failed, what's blocked.
4. **AX exposure gap.** A claude.ai surface uses a property the AX
tree doesn't expose (e.g., custom `data-state` attributes
without a corresponding ARIA reflection). Stop, document the
gap, ask the user before adding a hybrid AX+DOM walk.
### What you should NOT do
- Don't commit. The user reviews everything.
- Don't keep both substrates. The migration is atomic per method:
CSS walk out, AX walk in. No fallback chains.
- Don't add new abstractions in `claudeai.ts` that aren't required
by the migration. The file's shape (one function per UI verb) is
load-bearing for callers — don't introduce a `PageObject` base
class or a generic AX builder.
- Don't run the host Claude Desktop. The user runs it. The H/S
specs use `launchClaude` with `seedFromHost` or `null` isolation
per spec — confirm with the user before any sweep.
- Don't widen `RawElement` speculatively. Only add fields the
migration consumes. Each new field bloats every snapshot.
- Don't drill into a single-method workaround that other methods
would have to duplicate. If a fix wants to live in a helper,
put it next to `queryAccessibleTree` in `walker.ts`.
### Final report format
```markdown
## Migration summary
- Functions migrated: N / N
- Walker.ts changes: <one-line summary>
- Inspector.ts changes: <one-line summary or none>
- H/S/T specs run: N
- H/S/T specs passed: N
- New flakes introduced: N (description)
## Iteration log
### Spike — openPill
- Result: ...
- AX shape used: ...
- Issues hit: ...
### Phase B — remaining methods
- One block per method ...
### Phase C — full sweep
- Per-spec pass/fail tally
- Diff against pre-migration baseline
## Open issues
- ...
## Files touched
git status output
## Diff for review
git diff --stat output
```
### Operational notes
- Background runs: use `Bash run_in_background: true` for any
multi-spec sweep, and `Monitor` with a tight grep filter
(`✓|✘|Error|FAIL|EXIT=`) to stream events. Stop the monitor when
the run completes.
- Check for leftover Electron processes between runs
(`pgrep -af '/usr/lib/claude-desktop/node_modules/electron'`)
and stale tmpdirs (`ls /tmp/claude-test-*`) — clean both up if
the prior run errored before teardown.
- The U01 wire-up landed two `walker.ts` fixes that are part of
the substrate you're inheriting:
1. `findByFingerprint`: strictness gate also defers to
`fingerprint.classification === 'instance'` for degenerate
fingerprints.
2. `redrivePath`: navigates to startUrl when current URL drifted;
reloads only when already at startUrl.
Both are live in the working tree (or just-merged main,
depending on when this prompt fires).
Begin with Phase A. Read `claudeai.ts` end-to-end first — in
particular the file-header discovery comment (lines 1-31) and the
`openPill` body (lines 162-202) — so you understand what the
existing CSS-shape walks are anchoring on before you replace them.

View File

@@ -0,0 +1,218 @@
# claude.ai UI Map
*Last updated: 2026-05-02*
This file is the index from "UI surface" → "test-harness abstraction." It
answers: *which renderer surface does each Layer-2 helper cover, and where
are the gaps?* For human-readable behavior and visual specs of each surface
(what each button looks like, what each menu does), see [`ui/`](./ui/).
For the architectural rationale and growth strategy of the wrapper, see
[`claudeai-ui-mapping-plan.md`](./claudeai-ui-mapping-plan.md).
A `✓` marker means the helper exists today, with a `file:line` reference
into [`tools/test-harness/src/lib/claudeai.ts`](../../tools/test-harness/src/lib/claudeai.ts).
A `TODO` marker is a planned helper — when a third test needs the same
shape, promote it from inline `evalInRenderer` to a top-level helper or
page-object method (see plan Phase 3).
## Top-level routes
- `/new` — chat composer page (default landing for signed-in users)
- `/chat/<uuid>` — open chat session
- `/epitaxy` — Code tab landing
- `/projects/<id>` — project view
- `/login`, `/auth/*` — pre-login routes (test harness skips here)
The Code df-pill click does **not** change the URL — the router rerenders
the tab body inline. Helpers must poll for body-mount signals (e.g. a
compact pill rendering) rather than waiting on navigation.
## Surfaces by tab
### Chat (df-pill "Chat", route /new)
UI reference: [`ui/prompt-area.md`](./ui/prompt-area.md),
[`ui/window-chrome-and-tabs.md`](./ui/window-chrome-and-tabs.md).
- df-pill activation — `lib/claudeai.ts:activateTab` (:44) ✓
- Composer textarea — TODO `ChatTab.composer()`
- "+" submenu (Add files / Add to project / Skills / Connectors / ...)
— TODO `ChatTab.openAttachMenu()`
- Slash menu (triggered by typing `/`) — TODO `ChatTab.openSlashMenu()`
- Model picker — TODO `ChatTab.openModelPicker()`
- Permission mode picker — TODO `ChatTab.openPermissionPicker()`
- Effort picker — TODO
- Send button — TODO `ChatTab.send()`
- Stop button (replaces Send while responding) — TODO `ChatTab.stop()`
- Attachment chip / drag-drop overlay — TODO
- Usage ring — TODO
### Cowork (df-pill "Cowork")
UI reference: see ghost-icon row in
[`ui/window-chrome-and-tabs.md`](./ui/window-chrome-and-tabs.md). No
dedicated surface doc yet — the ghost icon is the canonical "topbar shim
alive" indicator and the tab body itself is largely undocumented at the
time of writing.
- df-pill activation — `lib/claudeai.ts:activateTab` (:44) ✓
- Workspace list — TODO `CoworkTab.listWorkspaces()`
- Environment switcher — TODO `CoworkTab.switchEnvironment()`
- Dispatch state indicator — TODO
### Code (df-pill "Code", route /epitaxy)
UI reference: [`ui/code-tab-panes.md`](./ui/code-tab-panes.md),
[`ui/sidebar.md`](./ui/sidebar.md),
[`ui/prompt-area.md`](./ui/prompt-area.md).
- df-pill activation — `lib/claudeai.ts:activateTab` (:44) ✓
- Tab activation + body-mount wait — `lib/claudeai.ts:CodeTab.activate` (:285) ✓
- Env pill (Local / Cloud / SSH) — `lib/claudeai.ts:CodeTab.openEnvPill` (:317) ✓
- Local env selection — `lib/claudeai.ts:CodeTab.selectLocal` (:350) ✓
- Select-folder pill (rendered after Local) — used internally by
`lib/claudeai.ts:CodeTab.openFolderPicker` (:368) ✓
- Folder picker dialog (full chain) — `lib/claudeai.ts:CodeTab.openFolderPicker` (:368) ✓
- Folder picker dialog mock + assertion — `lib/claudeai.ts:installOpenDialogMock`
(:70) ✓ + `lib/claudeai.ts:getOpenDialogCalls` (:113) ✓
- File tree (left panel) — TODO `CodeTab.fileTree()`
- Editor pane — TODO `CodeTab.editor()`
- Diff pane — TODO `CodeTab.openDiff()`
- Preview pane — TODO `CodeTab.openPreview()`
- Integrated terminal — TODO `CodeTab.openTerminal()`
- Tasks / subagent / plan panes — TODO
- Side-chat — TODO `CodeTab.openSideChat()`
- Recent-folder selection (radio in Select-folder menu) — TODO
## Surfaces independent of tab
### Sidebar
UI reference: [`ui/sidebar.md`](./ui/sidebar.md).
- Search overlay (topbar Search icon) — TODO `SidebarNav.search()`
- Recent conversations — TODO `SidebarNav.openRecent(idx | uuid)`
- "More options" per row — TODO `SidebarNav.rowContextMenu(uuid)`
- "+ New session" button — TODO `SidebarNav.newSession()`
- Routines link — TODO `SidebarNav.openRoutines()`
- Customize link — TODO `SidebarNav.openCustomize()`
- Status / project / environment filters — TODO
- Group-by control — TODO
- Collapse toggle — TODO
### Window chrome / topbar (in-app hybrid)
UI reference: [`ui/window-chrome-and-tabs.md`](./ui/window-chrome-and-tabs.md).
- Hamburger menu — TODO `Topbar.openHamburger()`
- Sidebar toggle — TODO `Topbar.toggleSidebar()`
- Back / forward arrows — TODO
- Cowork ghost icon (topbar-alive sentinel) — TODO `Topbar.coworkGhostPresent()`
### Native dialogs
- File / folder picker mock — `lib/claudeai.ts:installOpenDialogMock` (:70) ✓
- File / folder picker call inspection — `lib/claudeai.ts:getOpenDialogCalls` (:113) ✓
- Message box / confirm — TODO `installShowMessageBoxMock`
- Save dialog — TODO `installShowSaveDialogMock`
### Menus / popovers
- Compact-pill discovery — `lib/claudeai.ts:findCompactPills` (:130) ✓
- Compact-pill open + menu read — `lib/claudeai.ts:openPill` (:162) ✓
- Click any menuitem by text regex — `lib/claudeai.ts:clickMenuItem` (:210) ✓
- Dismiss popover via Escape — `lib/claudeai.ts:pressEscape` (:256) ✓
- Modal dismiss / confirm — TODO `Modal.dismiss()` / `Modal.confirm()`
- Toast / status — TODO `waitForToast(regex)`
- Right-click context menus (sidebar row, etc.) — TODO `openContextMenu(target)`
### Settings
UI reference: [`ui/settings.md`](./ui/settings.md).
- Open Settings — TODO `Settings.open()`
- Hotkey rebind — TODO `Settings.rebindHotkey(action, chord)`
- Theme toggle — TODO `Settings.setTheme('dark' | 'light' | 'auto')`
- Account / sign-out — TODO `Settings.signOut()`
- Computer-use toggle (absent on Linux per S22) — TODO
- Keep-computer-awake toggle (per S20) — TODO
### Routines page
UI reference: [`ui/routines-page.md`](./ui/routines-page.md).
- Routines list — TODO `RoutinesPage.list()`
- New-routine form — TODO `RoutinesPage.create(spec)`
- Routine detail page — TODO `RoutinesPage.open(id)`
### Connectors and plugins
UI reference: [`ui/connectors-and-plugins.md`](./ui/connectors-and-plugins.md).
- Connector picker — TODO `ConnectorPicker.open()`
- Connector list / status — TODO
- Plugin browser — TODO `PluginBrowser.open()`
- Plugin install (Anthropic & Partners flow) — TODO `PluginBrowser.install(slug)`
- Plugin manager (installed list) — TODO
### Quick Entry popup
UI reference: [`ui/quick-entry.md`](./ui/quick-entry.md). Note: the
Quick Entry harness lives in [`quickentry.ts`](../../tools/test-harness/src/lib/quickentry.ts),
not `claudeai.ts`. The `installOpenDialogMock` shape here intentionally
mirrors `QuickEntry.installInterceptor` (quickentry.ts:86) — keep them
aligned when extending either.
- Open Quick Entry (global shortcut) — covered by `lib/quickentry.ts`
- Compose + send — covered by `lib/quickentry.ts`
- Closeout cases (S29S37) — covered by `lib/quickentry.ts`
### Notifications
UI reference: [`ui/notifications.md`](./ui/notifications.md). libnotify
rendering is environmental — likely stays a manual checklist rather than
a renderer-side helper. No `claudeai.ts` coverage planned.
### Tray
UI reference: [`ui/tray.md`](./ui/tray.md). Tray is owned by the main
process / native bindings, not the renderer DOM — outside the scope of
`claudeai.ts`. Covered by separate tests (T03, S08).
## Atoms inventory
Stable structural patterns the lib already anchors on. See the
discovery comment at the top of
[`tools/test-harness/src/lib/claudeai.ts`](../../tools/test-harness/src/lib/claudeai.ts)
for why each is shape-matched rather than class-matched.
| Atom | Fingerprint | Helper |
|---|---|---|
| df-pill | `button[aria-label][class*="df-pill"]` | `activateTab(name)` (:44) |
| compact-pill | `button[aria-haspopup=menu] > span.truncate.max-w-[*]` | `findCompactPills` (:130), `openPill` (:162) |
| menu / menuitem | `[role=menu] [role=menuitem*]` | `clickMenuItem(regex)` (:210) |
| Escape dismiss | `document.dispatchEvent(KeyboardEvent('keydown', Escape))` | `pressEscape` (:256) |
| Electron `dialog.showOpenDialog` | main-process IPC | `installOpenDialogMock` (:70), `getOpenDialogCalls` (:113) |
Atoms not yet abstracted (when a third test needs the same shape,
promote to a top-level helper):
| Atom | Probable fingerprint | Status |
|---|---|---|
| modal | `[role=dialog]` | not seen yet |
| toast | `[role=status][aria-live]` | not seen yet |
| sidebar nav row | `[class*="df-row"] [aria-label]` | seen, not abstracted |
| chat composer | textarea / contenteditable in composer container | not abstracted |
| right-click context menu | `[role=menu]` triggered by `contextmenu` event | not abstracted |
| Electron `dialog.showMessageBox` | main-process IPC | not abstracted |
| Electron `dialog.showSaveDialog` | main-process IPC | not abstracted |
| settings panel section | route-anchored container in Settings tab | not abstracted |
## See also
- [`claudeai-ui-mapping-plan.md`](./claudeai-ui-mapping-plan.md) —
governing plan and phase rollout
- [`automation.md`](./automation.md) — harness architecture and the
SIGUSR1 / runtime-attach pattern
- [`ui/`](./ui/) — per-surface visual / behavior specs
- [`cases/`](./cases/) — functional test specs (T## / S##)

View File

@@ -0,0 +1,415 @@
# claude.ai UI Mapping Plan
This is an executable plan for systematically mapping claude.ai's
renderer UI into reusable test-harness abstractions. It can be picked
up by a fresh session — start at "Phase 1" and walk down.
## Where we are
The harness already has one worked example: `tools/test-harness/src/lib/claudeai.ts`
exports a `CodeTab` class plus atom helpers (`activateTab`,
`installOpenDialogMock`, `findCompactPills`, `openPill`, `clickMenuItem`,
`pressEscape`). `T17_folder_picker.spec.ts` is its only consumer
today — drives the chain `Code df-pill → env pill → Local → Select
folder → Open folder` and asserts `dialog.showOpenDialog` fires.
Discovery evidence captured by `tools/test-harness/probe.ts` (run
against a live debugger on port 9229):
- df-pill is a stable atom — exactly 3 instances on Code-tab page
(`Chat`, `Cowork`, `Code`), all with `class*="df-pill"` and
matching `aria-label`.
- compact-pill is a stable atom — `button[aria-haspopup=menu]` with
a `span.truncate.max-w-[Npx]` child. Env pill uses 200px,
Select-folder pill uses 160px. Same Tailwind class signature; we
anchor on structure, not classes.
- 80 `button[aria-haspopup=menu]` total on a Code-tab page; only the
2 with the truncate fingerprint are pills, the other 78 are sidebar
"More options" buttons.
Pattern proven: discovery-by-shape in the lib layer, page-object
classes per major UI surface, specs use the lib. This doc covers
how to extend that pattern across the rest of claude.ai.
## Strategy: three layers
**Layer 1 — atoms.** Generic helpers around stable structural
patterns. Live in `lib/claudeai.ts`. Built once, reused everywhere.
Examples already there: compact-pill, df-pill, menu, dialog mock.
**Layer 2 — page objects.** Domain classes per major UI surface
(CodeTab, ChatTab, Settings, etc.). Compose atoms. Built per test
demand — premature otherwise. CodeTab is the template.
**Layer 3 — discovery tooling.** Standalone scripts that connect to
a running debugger and let humans + agents explore the renderer.
`probe.ts` is the seed; this doc grows it into a small CLI.
The thing to avoid: comprehensively mapping the UI upfront. Even
with a recording tool, that burns time on surfaces no test will
exercise for months. Lazy + bookmark-the-shape wins.
## Phase 1 — Tooling foundation
**Goal:** turn `probe.ts` into a proper exploration CLI under
`tools/test-harness/explore/`, with snapshot + diff capability that
catches UI drift before tests do.
**Deliverables:**
- `tools/test-harness/explore/explore.ts` — entry point with
subcommands.
- `tools/test-harness/explore/snapshot.ts` — capture renderer state.
- `tools/test-harness/explore/diff.ts` — compare two snapshots.
- `tools/test-harness/explore/find.ts` — search for elements.
- `docs/testing/ui-snapshots/` — directory for captured snapshots
(gitignore the file contents but commit the directory + a README).
- `tools/test-harness/package.json` — add scripts:
`npm run explore`, `npm run explore:snapshot <name>`, etc.
**Subcommand spec:**
```
npx tsx explore/explore.ts # full snapshot to stdout
npx tsx explore/explore.ts pills # df-pills + compact-pills + state
npx tsx explore/explore.ts menu # currently-open menu structure
npx tsx explore/explore.ts snapshot <name> # write to docs/testing/ui-snapshots/<name>.json
npx tsx explore/explore.ts diff <a> <b> # diff two snapshots — flags renamed/removed
npx tsx explore/explore.ts find <regex> # search renderer for matching text/aria-label
```
Snapshot shape (per file):
```json
{
"capturedAt": "2026-05-02T17:30:00Z",
"claudeAiUrl": "https://claude.ai/epitaxy",
"appVersion": "1.1.7714",
"dfPills": [...],
"compactPills": [...],
"ariaLabeledButtons": [...],
"openMenu": null,
"modals": [...]
}
```
`diff` should flag: removed elements (selector → no match), changed
text/aria-label, new elements (informational, not a failure). Output
human-readable + a `--json` flag for machine consumption.
**How to dispatch this work:**
Single agent, `general-purpose`. Brief:
> Build the explore CLI under `tools/test-harness/explore/`. Read
> `tools/test-harness/probe.ts` as the seed implementation. Match the
> existing project style (tabs, multi-line `//` why-blocks, terse).
> Reuse `src/lib/inspector.ts` (`InspectorClient.connect(9229)`) for
> the debugger connection. Subcommands as specified in
> `docs/testing/claudeai-ui-mapping-plan.md` Phase 1. Do not delete
> probe.ts — leave it as a one-off; it can be removed in a follow-up.
> Typecheck with `npx tsc --noEmit` (no test runs). Add npm scripts
> to `package.json`. Add a thin README in
> `docs/testing/ui-snapshots/README.md` explaining how to capture +
> compare snapshots.
**Exit criteria:**
- `npx tsx explore/explore.ts pills` against a running debugger lists
the 3 df-pills and 2 compact-pills (or whatever's on screen).
- `explore/explore.ts snapshot baseline-code-tab` writes a JSON file.
- `explore/explore.ts diff baseline-code-tab baseline-code-tab`
reports zero diffs.
- Typecheck green.
## Phase 2 — UI map document
**Goal:** maintain a living markdown index of every reachable UI
surface, the navigation path to reach it, and which Layer-2 class
covers it (or `TODO` if none yet).
**Deliverable:** `docs/testing/claudeai-ui-map.md`.
**Initial content** (populate from what's known today, leave gaps
marked TODO):
```markdown
# claude.ai UI Map
Source of truth for "where does each UI surface live, and which
test-harness abstraction covers it." Update as new abstractions are
added.
## Top-level routes
- `/new` — chat composer page (default landing for signed-in users)
- `/chat/<uuid>` — open chat session
- `/epitaxy` — Code tab landing
- `/projects/<id>` — project view
- `/login`, `/auth/*` — pre-login routes (test harness skips here)
## Surfaces by tab
### Chat (df-pill "Chat", route /new)
- Composer textarea — TODO `ChatTab.composer()`
- "+" submenu (Add files / Add to project / Skills / Connectors / ...)
— TODO `ChatTab.openAttachMenu()`
- Model selector — TODO
- Stop / regenerate — TODO
### Cowork (df-pill "Cowork")
- Workspace list — TODO
- Environment switcher — TODO
### Code (df-pill "Code", route /epitaxy)
- Env pill (Local / Cloud / SSH) — `lib/claudeai.ts:CodeTab.openEnvPill()`
- Select folder pill — `lib/claudeai.ts:CodeTab` (used internally by
`openFolderPicker`) ✓
- Folder picker dialog — `lib/claudeai.ts:installOpenDialogMock`
- File tree (left panel) — TODO
- Editor pane — TODO
## Surfaces independent of tab
### Sidebar
- Search — TODO `SidebarNav.search()`
- Recent conversations — TODO `SidebarNav.openRecent(idx | uuid)`
- "More options" per row — TODO
- New session button — TODO
### Native dialogs
- File / folder picker — `lib/claudeai.ts:installOpenDialogMock`
- Message box / confirm — TODO `installShowMessageBoxMock`
- Save dialog — TODO `installShowSaveDialogMock`
### Menus / popovers
- Generic menu open + click — `lib/claudeai.ts:openPill` /
`clickMenuItem`
- Modal — TODO `Modal.dismiss() / Modal.confirm()`
- Toast / status — TODO `waitForToast(regex)`
### Settings
- Hotkey rebind — TODO
- Theme toggle — TODO
- Account / sign-out — TODO
## Atoms inventory
Stable structural patterns the lib already anchors on:
| Atom | Fingerprint | Helper |
|---|---|---|
| df-pill | `button[aria-label][class*="df-pill"]` | `activateTab(name)` |
| compact-pill | `button[aria-haspopup=menu] > span.truncate.max-w-[*]` | `findCompactPills`, `openPill` |
| menu / menuitem | `[role=menu] [role=menuitem*]` | `clickMenuItem(regex)` |
Atoms not yet abstracted (when a third test needs the same shape,
promote to a top-level helper):
| Atom | Probable fingerprint | Status |
|---|---|---|
| modal | `[role=dialog]` | not seen yet |
| toast | `[role=status][aria-live]` | not seen yet |
| sidebar nav row | `[class*="df-row"] [aria-label]` | seen, not abstracted |
| chat composer | textarea/contenteditable in composer container | not abstracted |
```
**How to dispatch this work:**
A claude-code-guide or general-purpose agent can write the initial
file. Single message:
> Create `docs/testing/claudeai-ui-map.md` matching the structure in
> `docs/testing/claudeai-ui-mapping-plan.md` Phase 2. Pull TODO
> entries from the planned ChatTab/Settings/etc. surfaces. Mark
> existing helpers from `tools/test-harness/src/lib/claudeai.ts`
> with ✓ and the file:line. Don't run any tests.
**Exit criteria:**
- File exists with all top-level routes documented.
- Every existing `lib/claudeai.ts` export is referenced ✓.
- Every planned surface from this plan has a TODO entry.
## Phase 3 — Page objects per test demand
**Goal:** add new Layer-2 classes (ChatTab, Settings, etc.) when the
first test needs them. Don't speculate.
**Template:** `tools/test-harness/src/lib/claudeai.ts:CodeTab`. Match
its shape:
- Instance class taking `inspector: InspectorClient` in constructor.
- Public methods are either single-step (`openEnvPill`,
`selectLocal`) or multi-step convenience (`openFolderPicker`).
- Discovery by shape, not Tailwind classes.
- Multi-line `//` why-block at top of class explaining what UI
surface it covers and the discovery strategy.
- Failures throw with enough context for the spec to attach to
`testInfo.attach()`.
**Workflow per new page object:**
1. Identify which test motivates the new class. Don't build
speculatively.
2. Run `explore.ts snapshot <name>` against a live debugger on the
target UI surface. Commit the snapshot under
`docs/testing/ui-snapshots/`.
3. Inspect the snapshot — pick stable structural fingerprints, not
Tailwind classes.
4. Write the class in `lib/claudeai.ts`. If the file gets large
(>1500 lines), split per-tab into separate files
(`lib/claudeai/code-tab.ts`, `lib/claudeai/chat-tab.ts`, with
`lib/claudeai.ts` as the barrel).
5. Update `docs/testing/claudeai-ui-map.md` — replace the TODO with
the class name + ✓.
6. Add the spec that uses it.
7. Run typecheck. Don't run tests until everything's wired.
**Don't pull out yet:**
- Single-consumer methods. If only one spec calls
`Settings.toggleDarkMode()`, the inline implementation is fine.
Promote to its own method when a second consumer arrives.
- Generic primitives that haven't repeated three times. Three is
the threshold for "this is an atom" — two could still be
coincidence.
## Phase 4 — Atom promotion
**Goal:** keep the atom layer (Layer 1) growing in step with the
page-object layer (Layer 2).
**Rule:** when a discovery pattern (CSS selector + JS predicate)
appears in 3 different page objects, promote it to a top-level
helper in `lib/claudeai.ts`.
**Examples of likely promotions in the next 6 months:**
- `findModal()` / `dismissModal()` — every page object that opens a
confirmation modal will need this.
- `waitForToast(regex, timeout)` — error and success toasts are
pervasive.
- `installShowMessageBoxMock(inspector, response)` — for native
confirm dialogs.
- `clickNavRow(label)` — sidebar interactions.
**Process:**
1. Notice the third occurrence of the same pattern.
2. Move the inline implementation up to a top-level export.
3. Replace the three call sites with calls to the new export.
4. Add an entry to the atoms inventory in `claudeai-ui-map.md`.
## Phase 5 — Drift detection
**Goal:** catch UI changes that break selectors *before* a sweep
fails — fast, automatic, runs on every harness invocation.
**Deliverable:** `tools/test-harness/src/runners/H05_ui_drift_check.spec.ts`.
**Design:**
- Loads each `*.json` file from `docs/testing/ui-snapshots/`.
- Connects to a running app via the existing `launchClaude` +
`attachInspector` flow (NOT against an externally-running app —
the harness must be self-contained).
- For each snapshot, navigates to the captured URL (if not already
there), then asserts each captured selector still resolves to an
element with the same text/aria-label.
- Failures are *attachments*, not full failures — the spec passes
if ≥80% of snapshots match, surfaces the diffs as warnings. Hard
threshold can be tightened later. Goal is "tell me what drifted,"
not "block CI on every minor renderer change."
**How to dispatch:**
Single agent, after Phases 12 are done. Brief:
> Create `tools/test-harness/src/runners/H05_ui_drift_check.spec.ts`
> per the design in `docs/testing/claudeai-ui-mapping-plan.md`
> Phase 5. Read each `*.json` under `docs/testing/ui-snapshots/`,
> drive the renderer to the captured URL, assert each captured
> element selector still matches. Surface diffs via
> `testInfo.attach`. Pass if ≥80% match. Severity Should, surface
> "claude.ai UI drift detection". Typecheck only.
**Exit criteria:**
- Runs cleanly against current renderer state (all snapshots match).
- Returns ≤200ms per snapshot.
- Skip with a clear message when no signed-in host config available
(most snapshots will be of post-login surfaces).
## Recommended order
1. **Phase 1 (tooling)** — ~2 hours, single agent. Foundation for
everything else.
2. **Phase 2 (UI map doc)** — ~30 min, single agent. Cheap,
self-documenting.
3. **Phase 3 (page objects)** — incremental, per test need.
4. **Phase 4 (atom promotion)** — opportunistic, no scheduled work.
5. **Phase 5 (drift detection)** — once Phase 1 is done and a few
snapshots exist.
Phases 1 and 2 are independent and can run in parallel.
## Today's starting state (reference)
What's already in place as of session-end:
```
tools/test-harness/
├── probe.ts # one-off probe (Phase 1 seed)
├── src/
│ ├── lib/
│ │ ├── claudeai.ts # CodeTab + atoms (NEW today)
│ │ ├── electron.ts # SIGINT cleanup, lastExitInfo
│ │ ├── inspector.ts # idempotent close()
│ │ ├── quickentry.ts # disk-read getStoredPosition
│ │ └── ... (unchanged)
│ └── runners/
│ ├── H01_cdp_gate_canary.spec.ts # NEW
│ ├── H02_frame_fix_wrapper_present.spec.ts # NEW
│ ├── H03_patch_fingerprints.spec.ts # NEW
│ ├── H04_cowork_daemon_lifecycle.spec.ts # NEW
│ ├── T17_folder_picker.spec.ts # refactored to lib/claudeai.ts
│ ├── _investigate_t17_urls.spec.ts # one-off, can be deleted
│ └── ... (T01/T03/T04, S09/S12, S29-S37)
├── orchestrator/sweep.sh # multi-suite JUnit parser
└── playwright.config.ts # CI-gated retries + forbidOnly
```
**Pending cleanup** (covered in a final commit, not part of this plan):
- Delete `_investigate_t17_urls.spec.ts` — investigation served.
- Delete `probe.ts` once `explore/` lands and supersedes it.
- Update `tools/test-harness/README.md` Status table — T17 from
"selector-tuning pending" to passing on KDE-W.
**Useful commands for a fresh session:**
```sh
cd /home/aaddrick/source/claude-desktop-debian/tools/test-harness
# Typecheck (must pass after every edit)
npx tsc --noEmit
# Run a single spec
ROW=KDE-W CLAUDE_TEST_USE_HOST_CONFIG=1 npx playwright test \
src/runners/T17_folder_picker.spec.ts --reporter=list
# Full sweep
ROW=KDE-W CLAUDE_TEST_USE_HOST_CONFIG=1 ./orchestrator/sweep.sh
# Probe a running app (requires main process debugger enabled)
npx tsx probe.ts
# Kill stale instances before launch
pkill -9 -f claude-desktop; pkill -9 -f mount_claude
```
**Before starting Phase 1:** open Claude Desktop, enable
`Developer → Enable Main Process Debugger` from the menu, navigate
to a known UI state. Then run `npx tsx probe.ts` to confirm the
inspector is reachable on port 9229.

View File

@@ -0,0 +1,490 @@
# Fingerprint v7 Plan — Contextual, Account-Portable Identification
This is an executable plan for the v6 → v7 migration of the inventory
fingerprint shape used by `tools/test-harness/explore/walker.ts` and
`tools/test-harness/src/runners/U01_ui_visibility.spec.ts`. It can be
picked up by a fresh session — start at "Phase 1" and walk down.
## Where we are
`docs/testing/ui-inventory.json` v6 (captured 2026-05-03 against app
1.5354.0, 383 entries) records each interactive element with a
fingerprint of this shape:
```ts
fingerprint: {
selector: 'button[aria-label="Search"]',
ariaLabel: 'Search',
role: null,
tagName: 'BUTTON',
textContent: null,
}
```
`U01` resolves entries by handing the `selector` field to Playwright.
The current scheme has three load-bearing failure modes:
1. **Account-specific names baked into selectors and IDs.** Entries
like `root.button.awaaddrick-max` (the user's plan badge,
`button:has-text("AWAaddrick·Max")`) hardcode the walker-author's
username + plan tier. Any contributor running U01 against their
own auth fails this entry on selector match — the element is
structurally present, just labeled differently.
2. **Instance text in selectors of "stable" entries.** Search-result
options, recent-conversations buttons, and pinned conversations
carry titles like "Fine-tuning diffusion models with reinforcement
learning" in their selectors. These are inherently per-account; the
`kind: instance` taxonomy already exists to handle them, but the
selector still encodes the literal title, so the v6 capture
couldn't actually leverage `instance` semantics.
3. **Selector brittleness under cosmetic redesigns.** `button:has-text(...)`
selectors break under any label change. `button[aria-label="..."]`
selectors break under any aria-label rewrite (which the upstream
team does for accessibility audits without warning). Neither
strategy carries enough redundancy to recover when one signal drifts.
The reconciliation doc (`ui-inventory-reconciliation.md`) flags these
as "Walker coverage gap" and "Account-state-dependent" categories,
and the U01 brief lists per-user inventory regeneration as "a
separate workstream." This is that workstream.
## Design goals
In priority order:
1. **Account-portable.** A v7 inventory walked against User A's
account matches against User B's renderer for any entry whose
target element is structurally present in both accounts. Entries
that genuinely don't exist in B's account fall back to the existing
"skip if absent" semantics (`kind: instance` + ancestor-presence
check).
2. **Resilient to cosmetic drift.** Label changes, aria-label
rewrites, minified-class churn, and CSS rewrites must not
invalidate the fingerprint when the element's semantic role and
structural position survive.
3. **Surface drift before failure.** Soft drift (primary aria-path
missed, relaxed-scope match recovered) attaches a warning to the
test rather than passing silently. Hard drift (no strategy
resolves) fails as today. The sweep gains a third state:
`passed-with-drift`.
4. **Atomic cutover, not gradual migration.** v7 walker, v7 inventory
schema, and v7 resolver land together. The committed v6 inventory
gets invalidated the moment v7 walker ships; no parallel-emit
compatibility window, no `legacy` selector fallback in the
resolver. Two systems are worse than one.
Non-goals:
- Pixel-level visual diff. Separate concern; H05 is the right shape.
- AI / embedding-based matching. Out of scope for a Linux repackager.
- Behavioral fingerprints (click-and-verify-effect). Too expensive at
383 entries.
## v7 schema
```ts
interface FingerprintV7 {
// Primary: accessibility-tree path from nearest landmark down to
// the leaf. Each step carries (role, optional name).
ariaPath: AriaStep[];
// The element itself. Drops `name` entirely when role + ariaPath
// suffice for uniqueness on the captured surface.
leaf: {
role: string; // "button", "link", "menuitem", ...
name: NameMatcher | null;
siblingIndex: SiblingIndex | null;
};
// Stability classification — drives how strictly the resolver
// matches. See "Kind-strictness matrix" below. Distinct from the
// existing `kind` field (persistent / structural / menu / instance)
// which captures *lifecycle*, not *match strictness*.
classification: 'stable' | 'positional' | 'instance';
}
interface AriaStep {
role: string; // landmark / region / grouping role
name: NameMatcher | null; // optional — only included when needed
}
type NameMatcher =
| { kind: 'literal'; value: string } // "Search", "Cowork"
| { kind: 'pattern'; regex: string }; // "\\w+·(Free|Pro|Max|...)"
interface SiblingIndex {
role: string; // role of siblings being indexed
position: number; // 0-based
total: number; // total siblings of that role at capture
}
```
## Capture algorithm
Run during walker.ts's element emission, after the surface has settled.
```text
captureFingerprint(element, surface):
ariaPath = walkLandmarkAncestors(element)
// Stop at <body>; emit a step for each role in
// {banner, main, navigation, region, complementary,
// contentinfo, search, form, toolbar, menu, menubar,
// listbox, list, dialog, tablist, tabpanel, group}
// with grouping role plus optional accessible name.
role = element.role
name = element.accessibleName
// Step 1: try uniqueness without the name.
matches = surface.queryAccessibleTree({
ariaPath,
leaf: { role }
})
if matches.length == 1:
return { ariaPath, leaf: { role, name: null, siblingIndex: null },
classification: 'stable' }
// Step 2: still too broad — try the name as a discriminator,
// shaping it if it looks instance-specific.
classification = classifyName(name, surface)
if classification != 'instance':
nameMatcher = (classification == 'positional')
? null
: (looksInstanceShaped(name)
? { kind: 'pattern', regex: shapeOfName(name) }
: { kind: 'literal', value: name })
matches = surface.queryAccessibleTree({
ariaPath, leaf: { role, name: nameMatcher }
})
if matches.length == 1:
return { ariaPath, leaf: { role, name: nameMatcher,
siblingIndex: null },
classification }
// Step 3: still ambiguous — fall through to sibling position.
siblings = element.parent.childrenWithRole(role)
if siblings.length > 1:
siblingIndex = {
role,
position: siblings.indexOf(element),
total: siblings.length
}
return { ariaPath, leaf: { role, name: null, siblingIndex },
classification: 'positional' }
// Step 4: instance — assert ≥1 match within ariaPath.
return { ariaPath, leaf: { role, name: null, siblingIndex: null },
classification: 'instance' }
```
`queryAccessibleTree` should hit `Accessibility.getFullAXTree` over
CDP, not the DOM. The accessibility tree is what screen readers see
and what the platform APIs query — it's the substrate that aria
roles and accessible names actually live in.
## Name classifier
`classifyName(name, surface)` decides whether a name is `stable`,
`instance`, or `positional` (no usable name). Heuristics in priority
order:
```text
1. Empty / whitespace name → 'positional'
2. Element is a list-row child → 'instance' (handled by ancestor
role: option/listitem inside listbox/list)
3. Name matches a known
instance-shape regex → 'instance' (record as pattern)
4. Name is in the corpus of
"stable UI vocabulary" → 'stable'
5. Default → 'stable' but flag for review
```
### Known instance-shape regexes
| Regex | Example match | Shape recorded |
|---|---|---|
| `/^.+·(Free\|Pro\|Max\|Team\|Enterprise)$/` | `AWAaddrick·Max` | `\\w+·<PLAN>` |
| `/^Opus \d/` `/^Sonnet \d/` `/^Haiku \d/` | `Opus 4.7Adaptive` | model-name passthrough (stable across users, just versioned) |
| `/\d{1,3}%$/` | `Usage: plan 11%` | `Usage: plan \d+%` |
| `/Today\|Yesterday\|\d+ (day\|hour\|minute)s? ago/` | `Today+12` | `<RELATIVE-DATE>(\\+\d+)?` |
| `/^\d+\.\d+ \w+/` | `1.5 GB` | `\d+\.\d+ \w+` |
| `/@\w+/` | `@aaddrick` | `@\w+` (treat as user-handle) |
| `/[A-Z][a-z]+ [A-Z][a-z]+ [a-z]/` (3+ word title-case) | `Fine-tuning diffusion models...` | treat as `'instance'`, no pattern |
These regexes live in a registry that's part of the v7 capture
config. Adding a new shape is a one-file change; the registry should
be ordered (first match wins) so specific patterns take precedence
over general ones.
### Building the stable UI vocabulary
After the walker finishes the BFS, run a second pass:
1. Collect every `accessibleName` from every captured element.
2. Bucket by `kind` (existing taxonomy).
3. Names appearing in 3+ entries with `kind: persistent` or
`kind: structural`, across 2+ surfaces, are **stable**.
4. Names appearing in only 1 entry with `kind: persistent`/`structural`
are **suspect** — flag for human triage during reconciliation.
5. Names in `kind: instance` entries are excluded from the corpus
entirely.
Commit the resulting vocabulary list to
`docs/testing/ui-vocabulary.json` so future walks can use it without
re-deriving. Refresh the vocabulary on each major upstream release.
## Kind-strictness matrix
The existing `kind` field (`persistent` / `structural` / `menu` /
`instance`) tunes how strictly the resolver matches at runtime,
independently from the capture-time `classification`:
| kind | aria-path required | name required | siblingIndex strict | assertion |
|---|---|---|---|---|
| `persistent` | yes (deepest scope) | matcher must hit if present | yes | exactly 1 match |
| `structural` | yes (or 1 step shallower) | matcher OR position | flexible (±1 ok) | exactly 1 match |
| `menu` | yes, scoped to transient menu surface | literal text fallback ok | n/a | ≥1 match |
| `instance` | yes (closest list/listbox ancestor) | ignored | ignored | ≥1 match within scope |
Examples:
- `root.button.search``kind: persistent`, `classification: stable`,
`name: null` (unique by ariaPath alone). Strict 1-match assertion.
- `root.button.awaaddrick-max``kind: persistent`, `classification: stable`,
`name: { kind: 'pattern', regex: '\\w+·(Free|Pro|Max|...)' }`.
Plan-shape pattern; user-portable.
- `root.button.search.option.untitled-conversationtoday+12`
`kind: instance`, `classification: instance`, no name, scoped to
search-results listbox. Assert ≥1 option in listbox.
- `root.button.fine-tuning-diffusion-models-with-reinforcement-learning`
`kind: instance`, scoped to pinned-conversations list. Assert ≥1
button in pinned list.
## Resolver / fallback chain
In `findByFingerprint`:
```text
resolve(fp):
// Strategy 1 — primary: full aria-tree path
result = tryAriaTreeMatch(fp.ariaPath, fp.leaf, fp.kind)
if result.matched: return { found: true, strategy: 'aria-tree' }
// Strategy 2 — relaxed aria scope (drop deepest landmark step
// in the path; keep the rest). Catches the common case where the
// upstream team adds or removes one container layer.
if fp.ariaPath.length > 1:
result = tryAriaTreeMatch(fp.ariaPath.slice(0, -1), fp.leaf, fp.kind)
if result.matched: return {
found: true, strategy: 'aria-tree-relaxed', drift: 'scope-shifted'
}
return { found: false, strategy: null }
```
When `drift` is set, attach a soft warning to the Playwright test
without failing it:
```ts
testInfo.attach('drift-warning', {
body: JSON.stringify({
entryId: entry.id,
expected: fp.ariaPath,
matchedVia: result.strategy,
drift: result.drift,
note: 'primary aria-tree match failed; recovered via fallback. ' +
'Re-walk inventory before drift compounds.',
}, null, 2),
contentType: 'application/json',
});
```
CI exposes `drift-warning` as a separate counter alongside pass /
fail. Sweep summary becomes `383 passed, 12 with drift, 0 failed`.
## Migration plan
The cutover is atomic — no parallel-emit window. Walker, schema, and
resolver all flip from v6 to v7 in the same merge. The committed v6
inventory becomes invalid; first action after merge is a re-walk.
### Phase 1 — vocabulary scaffold (pre-walker)
The name classifier needs a stable-UI vocabulary corpus to
disambiguate suspect names from known-stable copy. Build it from the
existing v6 inventory before the walker rewrite:
1. Iterate `docs/testing/ui-inventory.json` v6.
2. Names appearing in 3+ entries with `kind: persistent` or
`kind: structural`, across 2+ surfaces, are **stable**.
3. Names matching any registry regex (plan badge, model version,
percentage, relative date, user handle) are **instance-shaped**.
4. Names appearing in only 1 entry, not matching a regex, not in
`kind: instance` — flag for human triage.
5. Commit the resulting corpus to `docs/testing/ui-vocabulary.json`.
The corpus survives the walker rewrite — it's keyed on names, not on
v6 schema specifics.
### Phase 2 — walker rewrite
1. Add `Accessibility.getFullAXTree` query to walker's surface-settle
step (or AX subtree at target node if full-tree latency is
unacceptable; see open questions).
2. Implement `walkLandmarkAncestors`, `queryAccessibleTree`,
`captureFingerprint` per the algorithm above.
3. Implement the name classifier consuming `ui-vocabulary.json` and
the instance-shape registry.
4. Replace v6 fingerprint emit with v7. Inventory schema header bumps
to `walkerVersion: 7`; v6 readers will fail loudly rather than
silently mis-resolve.
5. Walker passes that fail to compute a v7 fingerprint (AX query
error, accessible-name-computation failure) emit the entry with
`classification: 'positional'` and `name: null`, scoped to its
ariaPath. Uncaptured fingerprints are not silently dropped — they
become positional entries with explicit looseness.
Acceptance: a walk against the v6-author's account produces v7
fingerprints for ≥98% of the surfaces v6 captured. ≥80% have
`classification: 'stable'`; the rest split between `'positional'` and
`'instance'`.
#### Live-walk shakedown (post-Phase 2)
The first end-to-end walks against the running renderer surfaced five
real bugs the synthetic selfTest couldn't see. All landed in
`walker.ts` / `name-classifier.ts` / `inspector.ts`:
1. **AX-tree settle gate.** `Accessibility.enable` populates the tree
asynchronously; the existing `waitForStable` (1.5s ceiling on
DOM-mutation quiescence) returned long before claude.ai's React
tree mounted. Seed snapshots came back with 4 AX nodes (just the
`RootWebArea` + a generic shell) and the walker emitted zero
entries. Fix: `waitForAxTreeStable(inspector, { minNodes: 20 })`
polls `getFullAXTree` until two consecutive reads return the same
node count. Called once before the seed snapshot and once after
each `navigateTo` in `redrivePath`. Baked into every
`snapshotSurface` call too (with `minNodes: 1`) so post-click
reads don't race the React update.
2. **`reloadPage` in `redrivePath`.** `navigateTo(url)` short-circuits
when `currentUrl === url`, but every BFS pop re-navigates to
`startUrl`, so any state a prior drill left behind (open dialog,
expanded sidebar, scrolled focus) carried into the next redrive
and contaminated `clickById`'s snapshot. Replaced the redrive's
initial `navigateTo` with `location.reload()` to discard the
React tree.
3. **List-row sibling-count heuristic.** The plan's `isListRowChild`
check requires `option/listitem` inside `listbox/list`. claude.ai
exposes the marketplace dialog as `dialog > button[]` with no
list role at all (~80 cards) and the cowork sidebar as
`complementary > button[]` (72 sessions). Without a heuristic,
each row literal-matches by name and emits as a separate stable
entry. Extension: `LIST_ROW_ROLES` includes `button`,
`LIST_ANCESTOR_ROLES` includes `group`, AND `siblingTotal >= 15`
on its own qualifies regardless of ancestor role. Step 3
(positional fallback) also gates on `!isListRowChild` so list
rows fall through to step 4's `instance` collapse instead of
fragmenting into per-index positionals.
4. **Two new instance shapes** in `name-classifier.ts`:
`cowork-session` matches status-prefixed session titles
(`^(Idle|Ready|Working|Awaiting input|Pull request merged|Done|Failed|Cancelled)\s`)
and `row-more-options` matches per-row triggers
(`^More options for `). Both ordered before `long-title` so the
pattern wins over the no-pattern instance fallback.
5. **Lookup-failure threshold bump** 25 → 75. Sidebar virtualization
means the AX tree exposes a slightly different subset of cowork
sessions on each fresh load; redrives accumulate
"no element matches" misses in a row that aren't a real wedge.
The timeout counter (5 strikes) still gates against actual
renderer hangs.
Result on the AX migration's first clean walk
(`startUrl: claude.ai/epitaxy`, account: aaddrick, app 1.5354.0):
**90 entries** (37 persistent / 37 structural / 8 dialog / 8
instance), 6 denylisted, 23 non-fatal lookup misses. The marketplace
dialog folded to a single `button-instance+704`; the cowork sidebar
to `button-instance+72`; search history to `option-instance+25`.
Acceptance criteria from §Phase 2 met (≥98% structural overlap is
trivially true on a re-walk; ≥80% stable hit at 75/90 ≈ 83%).
### Phase 3 — resolver rewrite (U01 + walker.ts findByFingerprint)
1. Replace `findByFingerprint` body with the two-strategy chain
(primary aria-tree, relaxed-scope fallback). Drop the v6
selector code path entirely.
2. `gen-render-specs.ts` regenerates U01 from the v7 inventory; per-
entry test bodies consume `entry.fingerprint` (now v7-shaped)
directly.
3. Add the `drift-warning` attachment shape to U01's test runner.
4. Run U01 against the v7 inventory captured in Phase 2; baseline
drift counts.
Acceptance: U01 against a fresh walker pass produces 0 drift
warnings on the same account, fails 0 entries. Drift warnings only
appear when actually-drifted elements are encountered.
### Phase 4 — account-portability validation
1. A second contributor walks their own v7 inventory.
2. Diff against the v6-author's v7 inventory: structural overlap
should be ≥80% on `kind: persistent` and `kind: structural`
entries (the cross-user-stable subset).
3. Run the v6-author's inventory's U01 against the second
contributor's renderer (with `seedFromHost` lifting their auth).
4. Expect ≥80% pass on the cross-user-stable subset; `kind: instance`
entries pass via the ancestor-presence check.
This is the actual goal. If account-portability hits, the inventory
is no longer a "my-account snapshot" but a true render contract.
## Open questions
### Resolved
- **CDP `Accessibility.getFullAXTree` cost.** Not a bottleneck. The
signed-in `claude.ai/epitaxy` surface returns a 817-node tree;
`waitForAxTreeStable` settles in <1s once Chromium has populated
it. The cold-load gate dominates total latency, not per-call
overhead. Plan B (subtree queries at the target node) is unused.
- **Role overrides.** Confirmed working. `Skip to content` on
claude.ai is captured as `link` (its AX-computed role) regardless
of the underlying tag — a class of mismatch the v6 DOM walker
silently got wrong.
- **`account-bound` kind.** Not needed. The combination of
shape-patterned name matchers (plan badge, cowork session) +
the sibling-count list heuristic + persistent collapse handles
every account-shaped element observed in the first clean walk.
Re-evaluate if a future surface exposes account state without
one of those signals.
### Open
- **Accessible-name computation parity.** Chrome's AX-tree-computed
name should match what Playwright's `getByRole({ name })` matches
at resolution time, but they're independent implementations of
the ARIA name-computation spec. Validate at Phase 3 acceptance
with a sample of 50 entries — capture vs resolve should agree.
- **Stale vocabulary across releases.** When upstream renames
"Cowork" to "Workspaces" (hypothetical), the corpus needs to
update. Should vocabulary be re-derived automatically on each walk
(cheap, drift-following) or pinned to a committed version (stable,
manual updates)? Provisionally: re-derive on walk, commit the
derived corpus alongside the inventory so reconciliation can diff
vocabulary changes.
## Cross-references
- `tools/test-harness/explore/walker.ts` — capture site
- `tools/test-harness/explore/walk-isolated.ts` — driver that runs
the walk inside the test-harness `launchClaude` + `seedFromHost`
isolation path (use this rather than `explore walk` to avoid
mutating the host profile)
- `tools/test-harness/explore/gen-render-specs.ts` — emits U01 from
inventory; needs to consume v7 fingerprints
- `tools/test-harness/src/runners/U01_ui_visibility.spec.ts`
resolver consumer
- `tools/test-harness/src/lib/inspector.ts``getAccessibleTree`
+ `clickByBackendNodeId` for the AX-driven capture/click pair
- `docs/testing/ui-inventory-reconciliation.md` — current v6 reconciliation
- `docs/testing/claudeai-ui-mapping-plan.md` — broader UI mapping
strategy this fits inside

187
docs/testing/matrix.md Normal file
View File

@@ -0,0 +1,187 @@
# Test Status Matrix
*Last updated: 2026-04-30 · Tested against: claude-desktop 1.4758.0 (project varies per row)*
This is the live dashboard. Update this file (and only this file) when status changes. For the test specs themselves, see [`cases/`](./cases/). For orientation, see [`README.md`](./README.md).
Status legend: `✓` pass · `✗` fail · `🔧` mitigated · `?` untested · `-` N/A. Cells include linked issue/PR numbers when relevant.
## Cross-environment matrix (T-series)
| Test | KDE-W | KDE-X | GNOME | Ubu | Sway | i3 | Niri | Hypr-O | Hypr-N |
|------|-------|-------|-------|-----|------|----|------|--------|--------|
| [T01](./cases/launch.md#t01--app-launch) | ✓ | ? | ? | ? | ? | ? | ? | ? | ✓ |
| [T02](./cases/launch.md#t02--doctor-health-check) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T03](./cases/tray-and-window-chrome.md#t03--tray-icon-present) | ✓ | ? | ? | ? | ? | ? | ? | ? | ? |
| [T04](./cases/tray-and-window-chrome.md#t04--window-decorations-draw) | ✓ | ? | ? | ? | ? | ? | ? | ? | ✓ |
| [T05](./cases/shortcuts-and-input.md#t05--url-handler-opens-claudeai-links-in-app) | ? | ? | ? | ? | ✗ | ? | ? | ? | ? |
| [T06](./cases/shortcuts-and-input.md#t06--quick-entry-global-shortcut-unfocused) | ✓ | ✓ | ✗ [#404](https://github.com/aaddrick/claude-desktop-debian/issues/404) | 🔧 [#406](https://github.com/aaddrick/claude-desktop-debian/pull/406) | ? | ? | ✗ | ? | ? |
| [T07](./cases/tray-and-window-chrome.md#t07--in-app-topbar-renders--clickable) | ? | ? | ? | ? | ? | ? | ? | ✗ [#538](https://github.com/aaddrick/claude-desktop-debian/pull/538) | ✓ |
| [T08](./cases/tray-and-window-chrome.md#t08--hide-to-tray-on-close) | ✓ | ? | ? | ? | ? | ? | ? | ? | ? |
| [T09](./cases/platform-integration.md#t09--autostart-via-xdg) | ✓ | ? | ? | ? | ? | ? | ? | ? | ? |
| [T10](./cases/platform-integration.md#t10--cowork-integration) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T11](./cases/extensibility.md#t11--plugin-install-anthropic--partners) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T12](./cases/platform-integration.md#t12--webgl-warn-only) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T13](./cases/launch.md#t13--doctor-reports-correct-package-format) | ✗ | ✗ | ✗ | ? | ✗ | ✗ | ✗ | ? | ? |
| [T14](./cases/launch.md#t14--multi-instance-behavior) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T15](./cases/code-tab-foundations.md#t15--sign-in-completes-via-browser-handoff) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T16](./cases/code-tab-foundations.md#t16--code-tab-loads) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T17](./cases/code-tab-foundations.md#t17--folder-picker-opens) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T18](./cases/code-tab-foundations.md#t18--drag-and-drop-files-into-prompt) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T19](./cases/code-tab-foundations.md#t19--integrated-terminal) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T20](./cases/code-tab-foundations.md#t20--file-pane-opens-and-saves) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T21](./cases/code-tab-workflow.md#t21--dev-server-preview-pane) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T22](./cases/code-tab-workflow.md#t22--pr-monitoring-via-gh) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T23](./cases/code-tab-handoff.md#t23--desktop-notifications-fire) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T24](./cases/code-tab-handoff.md#t24--open-in-external-editor) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T25](./cases/code-tab-handoff.md#t25--show-in-files-file-manager) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T26](./cases/routines.md#t26--routines-page-renders) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T27](./cases/routines.md#t27--scheduled-task-fires-and-notifies) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T28](./cases/routines.md#t28--scheduled-task-catch-up-after-suspend) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T29](./cases/code-tab-workflow.md#t29--worktree-isolation) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T30](./cases/code-tab-workflow.md#t30--auto-archive-on-pr-merge) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T31](./cases/code-tab-workflow.md#t31--side-chat-opens) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T32](./cases/code-tab-workflow.md#t32--slash-command-menu) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T33](./cases/extensibility.md#t33--plugin-browser) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T34](./cases/code-tab-handoff.md#t34--connector-oauth-round-trip) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T35](./cases/extensibility.md#t35--mcp-server-config-picked-up) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T36](./cases/extensibility.md#t36--hooks-fire) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T37](./cases/extensibility.md#t37--claudemd-memory-loads) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T38](./cases/code-tab-handoff.md#t38--continue-in-ide) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
| [T39](./cases/code-tab-handoff.md#t39--desktop-cli-handoff-graceful-na) | ? | ? | ? | ? | ? | ? | ? | ? | ? |
## UI visibility (U-series)
Auto-generated render attestation: each entry in [`ui-inventory.json`](./ui-inventory.json) is asserted to mount with its recorded fingerprint on each platform. The single matrix cell aggregates every inventory entry — pass means every entry rendered, fail means at least one didn't (per-entry diagnostics in the JUnit attachments). Regenerate the spec with `npm run gen:render-specs` after re-walking. See [`claudeai-ui-mapping-plan.md`](./claudeai-ui-mapping-plan.md) for the discovery + walker design.
| Test | KDE-W | KDE-X | GNOME | Ubu | Sway | i3 | Niri | Hypr-O | Hypr-N |
|------|-------|-------|-------|-----|------|----|------|--------|--------|
| [U01](../tools/test-harness/src/runners/U01_ui_visibility.spec.ts) — UI visibility | ? | ? | ? | ? | ? | ? | ? | ? | ? |
## Environment-specific status
### Ubuntu / DEB
| ID | Test | Status | Notes |
|----|------|--------|-------|
| [S01](./cases/distribution.md#s01--appimage-launches-without-manual-libfuse2t64-install) | AppImage launches without manual `libfuse2t64` install | ✗ | Workaround documented; not yet filed |
| [S02](./cases/distribution.md#s02--xdg_current_desktopubuntu-gnome-doesnt-break-de-detection) | `XDG_CURRENT_DESKTOP=ubuntu:GNOME` doesn't break DE detection | ? | — |
| [S03](./cases/distribution.md#s03--deb-install-via-apt-pulls-all-required-runtime-deps) | DEB install via APT pulls all required runtime deps | ? | — |
### Fedora / RPM
| ID | Test | Status | Notes |
|----|------|--------|-------|
| [S04](./cases/distribution.md#s04--rpm-install-via-dnf-pulls-all-required-runtime-deps) | RPM install via DNF pulls all required runtime deps | ? | — |
| [S05](./cases/distribution.md#s05--doctor-recognises-dnf-installed-package-doesnt-false-flag-as-appimage) | Doctor recognises dnf-installed package (no AppImage false-flag) | ✗ | Affects KDE-W, KDE-X, GNOME, Sway, i3, Niri (T13) |
### Wayland-native (wlroots)
Applies to: Sway, Niri, Hypr-O, Hypr-N (any session running native Wayland rather than XWayland).
| ID | Test | Status | Notes |
|----|------|--------|-------|
| [S06](./cases/shortcuts-and-input.md#s06--url-handler-doesnt-segfault-on-native-wayland) | URL handler doesn't segfault on native Wayland | ✗ on Sway | Captured; not yet filed |
| [S07](./cases/shortcuts-and-input.md#s07--claude_use_wayland1-opt-in-path-works-without-crashing) | `CLAUDE_USE_WAYLAND=1` opt-in path works | ? | [#228](https://github.com/aaddrick/claude-desktop-debian/pull/228), [#232](https://github.com/aaddrick/claude-desktop-debian/pull/232) |
### KDE
Applies to: KDE-W, KDE-X.
| ID | Test | Status | Notes |
|----|------|--------|-------|
| [S08](./cases/tray-and-window-chrome.md#s08--tray-icon-doesnt-duplicate-after-nativetheme-update) | Tray icon doesn't duplicate after `nativeTheme` update | 🔧 | [`tray-rebuild-race.md`](../learnings/tray-rebuild-race.md) |
| [S09](./cases/shortcuts-and-input.md#s09--quick-window-patch-runs-only-on-kde-post-406-gate) | Quick window patch runs only on KDE | ✓ | [PR #406](https://github.com/aaddrick/claude-desktop-debian/pull/406) |
| [S10](./cases/shortcuts-and-input.md#s10--quick-entry-popup-is-transparent-no-opaque-square-frame) | Quick Entry popup is transparent | ? | [#370](https://github.com/aaddrick/claude-desktop-debian/issues/370), [#223](https://github.com/aaddrick/claude-desktop-debian/issues/223) |
### GNOME
Applies to: GNOME, Ubu (Ubuntu's GNOME), and any other mutter session.
| ID | Test | Status | Notes |
|----|------|--------|-------|
| [S11](./cases/shortcuts-and-input.md#s11--quick-entry-shortcut-fires-from-any-focus-on-wayland-mutter-xwayland-key-grab) | Quick Entry shortcut fires from any focus | ✗ on GNOME, 🔧 on Ubu | [#404](https://github.com/aaddrick/claude-desktop-debian/issues/404), [PR #406](https://github.com/aaddrick/claude-desktop-debian/pull/406) |
| [S12](./cases/shortcuts-and-input.md#s12----enable-featuresglobalshortcutsportal-launcher-flag-wired-up-for-gnome-wayland) | `--enable-features=GlobalShortcutsPortal` wired up | ? | [#404](https://github.com/aaddrick/claude-desktop-debian/issues/404) |
### Omarchy
| ID | Test | Status | Notes |
|----|------|--------|-------|
| [S13](./cases/tray-and-window-chrome.md#s13--hybrid-topbar-shim-survives-omarchys-ozone-wayland-env-exports) | Hybrid topbar shim survives Omarchy's Ozone-Wayland env exports | ✗ | [PR #538](https://github.com/aaddrick/claude-desktop-debian/pull/538) |
### Niri
| ID | Test | Status | Notes |
|----|------|--------|-------|
| [S14](./cases/shortcuts-and-input.md#s14--global-shortcuts-via-xdg-portal-work-on-niri) | Global shortcuts via XDG portal work on Niri | ✗ | Captured; not yet filed |
### AppImage
| ID | Test | Status | Notes |
|----|------|--------|-------|
| [S15](./cases/distribution.md#s15--appimage-extraction---appimage-extract-works-as-documented-fallback) | AppImage extraction (`--appimage-extract`) works as fallback | ? | — |
| [S16](./cases/distribution.md#s16--appimage-mount-cleans-up-on-app-exit) | AppImage mount cleans up on app exit | ? | — |
### Linux launcher / `.desktop` env handling
| ID | Test | Status | Notes |
|----|------|--------|-------|
| [S17](./cases/platform-integration.md#s17--app-launched-from-desktop-inherits-shell-path) | App launched from `.desktop` inherits shell `PATH` | ? | — |
| [S18](./cases/platform-integration.md#s18--local-environment-editor-persists-across-reboot) | Local environment editor persists across reboot | ? | — |
| [S19](./cases/routines.md#s19--claude_config_dir-redirects-scheduled-task-storage) | `CLAUDE_CONFIG_DIR` redirects scheduled-task storage | ? | — |
### Idle-sleep / suspend
| ID | Test | Status | Notes |
|----|------|--------|-------|
| [S20](./cases/routines.md#s20--keep-computer-awake-inhibits-idle-suspend) | "Keep computer awake" inhibits idle suspend | ? | — |
| [S21](./cases/routines.md#s21--lid-close-still-suspends-per-os-policy) | Lid-close still suspends per OS policy | ? | — |
### Computer Use (Linux: out-of-scope per upstream)
| ID | Test | Status | Notes |
|----|------|--------|-------|
| [S22](./cases/platform-integration.md#s22--computer-use-toggle-is-absent-or-visibly-disabled-on-linux) | Computer-use toggle is absent or visibly disabled | ? | — |
| [S23](./cases/platform-integration.md#s23--dispatch-spawned-sessions-dont-soft-lock-on-a-never-approvable-computer-use-prompt) | Dispatch sessions don't soft-lock on never-approvable prompt | ? | — |
### Dispatch
| ID | Test | Status | Notes |
|----|------|--------|-------|
| [S24](./cases/platform-integration.md#s24--dispatch-spawned-code-session-appears-with-badge-and-notification) | Dispatch-spawned Code session appears with badge + notification | ? | — |
| [S25](./cases/platform-integration.md#s25--mobile-pairing-survives-linux-session-restart) | Mobile pairing survives Linux session restart | ? | — |
### Auto-update vs. system package manager
| ID | Test | Status | Notes |
|----|------|--------|-------|
| [S26](./cases/distribution.md#s26--auto-update-is-disabled-when-installed-via-apt--dnf) | Auto-update is disabled when installed via `apt` / `dnf` | ? | — |
### Plugin / worktree storage
| ID | Test | Status | Notes |
|----|------|--------|-------|
| [S27](./cases/extensibility.md#s27--plugins-install-per-user-not-into-system-paths) | Plugins install per-user, not into system paths | ? | — |
| [S28](./cases/extensibility.md#s28--worktree-creation-surfaces-clear-error-on-read-only-mounts) | Worktree creation surfaces clear error on read-only mounts | ? | — |
## Known failures rollup
Tests currently `✗` somewhere — investigation priority order:
| Test | Failing on | Root cause |
|------|------------|------------|
| [T05 / S06](./cases/shortcuts-and-input.md#s06--url-handler-doesnt-segfault-on-native-wayland) | Sway | URL handler subprocess SIGSEGV on native Wayland — `Failed to connect to Wayland display` |
| [T06 / S11](./cases/shortcuts-and-input.md#s11--quick-entry-shortcut-fires-from-any-focus-on-wayland-mutter-xwayland-key-grab) | GNOME | mutter doesn't honour XWayland-side key grab |
| [T06 / S14](./cases/shortcuts-and-input.md#s14--global-shortcuts-via-xdg-portal-work-on-niri) | Niri | `BindShortcuts` returns error code 5 |
| [T07 / S13](./cases/tray-and-window-chrome.md#s13--hybrid-topbar-shim-survives-omarchys-ozone-wayland-env-exports) | Hypr-O | Hybrid topbar shim partial render under Omarchy's Ozone-Wayland env exports |
| [T13 / S05](./cases/launch.md#t13--doctor-reports-correct-package-format) | every Fedora row | Doctor only checks dpkg, false-flags every dnf install as AppImage |
| [S01](./cases/distribution.md#s01--appimage-launches-without-manual-libfuse2t64-install) | Ubuntu 24.04 | AppImage requires `libfuse2t64`; not auto-pulled |
## Notes on the current state
- Most cells are `?` because every captured VM in the recent test session ran the **released** build (`dnf install` / `apt install` / current AppImage), which predates [PR #538](https://github.com/aaddrick/claude-desktop-debian/pull/538). Topbar verification (T07) on the VM rows specifically requires a branch build deployed before any cell can flip from `?`.
- KDE-W status reflects @aaddrick's daily-driver host (Nobara KDE Plasma Wayland) where multiple features have been in continuous use.
- Hypr-N status reflects @typedrat's report on [PR #538](https://github.com/aaddrick/claude-desktop-debian/pull/538) ("Working great on NixOS with Hyprland").
- Hypr-O status reflects @lukedev45's broken-case report on [PR #538](https://github.com/aaddrick/claude-desktop-debian/pull/538) (partial render, root cause unconfirmed but Omarchy-env-specific — see [S13](./cases/tray-and-window-chrome.md#s13--hybrid-topbar-shim-survives-omarchys-ozone-wayland-env-exports)).
- T13 is `✗` on every Fedora row because the dpkg false-flag is a deterministic property of the doctor script, not a per-environment failure mode. It will flip to `✓` everywhere once the doctor learns to detect rpm/dnf installs.
- T15T39 are derived from upstream Claude Code Desktop docs (`code.claude.com/docs/en/desktop*`) — features whose Linux behavior is officially undocumented (the docs explicitly state "Linux is not supported" for the Code tab). All cells start as `?` because the upstream Code-tab feature surface has not been systematically exercised on the patched Linux build.

View File

@@ -0,0 +1,225 @@
# Quick Entry Closeout — Test Plan
Focused sweep plan for closing the three open Quick Entry issues:
- [#393](https://github.com/aaddrick/claude-desktop-debian/issues/393) — Submit doesn't open the main window (Ubuntu 24.04 GNOME and friends). Mitigated by [PR #406](https://github.com/aaddrick/claude-desktop-debian/pull/406)'s KDE-only gate; root cause is `BrowserWindow.isFocused()` returning stale-true on Linux Electron.
- [#404](https://github.com/aaddrick/claude-desktop-debian/issues/404) — Shortcut doesn't fire from unfocused state on Fedora 43 GNOME. mutter no longer honours XWayland-side key grabs. Fix path: wire `--enable-features=GlobalShortcutsPortal` into the launcher on GNOME Wayland.
- [#370](https://github.com/aaddrick/claude-desktop-debian/issues/370) — Opaque square frame behind the transparent Quick Entry popup on KDE Wayland. Bisected to Electron 41.0.4 (electron/electron#50213); upstream regression. Workarounds in `frame-fix-wrapper.js` not yet attempted.
This doc is a **sweep plan**, not a test catalog. Test bodies and diagnostics live in [`cases/`](./cases/); the live status dashboard lives in [`matrix.md`](./matrix.md). The 21 `QE-*` items below map to existing `T*` / `S*` IDs where possible, and call out gaps to add as new `S*` cases.
## Goal
Pass all `QE-*` items in [§ Test list](#test-list) on every row in [§ Mandatory matrix](#mandatory-matrix). When that holds, all three issues are closeable (or, for #370, demonstrably blocked on upstream Electron with reproducible evidence).
## Upstream design intent
Read this before reading the test list. Several `QE-*` rows test things upstream does not actually promise — those tests are still valuable as black-box behavior checks, but the calibration of "expected" matters.
Source for everything below: `build-reference/app-extracted/.vite/build/index.js`. Symbol names (`h1`, `ut`, `Ko`, `ynt`, `nde`, `g3A`, `u7A`) drift between releases — anchor on shape, not name.
### What upstream promises
- **Global shortcut** registered via Electron `globalShortcut.register()` (`:499416`). No app-focus gate — fires regardless of which app is focused.
- **Popup is lazily created** on first shortcut press (`if (!Ko || ...) Ko = new BrowserWindow(...)` near `:515375`). The popup `BrowserWindow` is constructed on demand, not at app startup. This is what makes QE-4 (closed-to-tray) work.
- **Position memory:** popup position persists across invocations via `an.get("quickWindowPosition")` (`:515491-515526`), keyed on monitor label + resolution. If the original monitor is gone, falls back to primary display.
- **Submit always creates a NEW chat session** when no `chatId` is provided (`ynt(e)` at `:515546`). Quick Entry never appends to an existing conversation.
- **Click-outside dismiss** is wired in the main process via the popup `blur` handler (`Ko.on("blur", () => g3A(null))` at `:515465`).
- **Popup survives main-window close.** If the user closes the main window via the X button (not full quit), `!ut || ut.isDestroyed()` guards at `:515595` skip the `show()/focus()` calls; the popup itself remains functional.
- **Window construction** sets `transparent: true`, `backgroundColor: "#00000000"`, `frame: false`, `alwaysOnTop: true` (level `"pop-up-menu"`), `skipTaskbar: true`, `resizable: false`, `show: false` (`:515375-515397`). `hasShadow: Zr` and `type: Zr ? "panel" : void 0` are macOS-only (`Zr === process.platform === "darwin"`).
### What upstream does NOT promise
- **Workspace migration.** No `setVisibleOnAllWorkspaces()`, no `moveTop()`, no `setWorkspace()` is called anywhere in the Quick Entry submit path. Whether the main window comes to the user's current workspace or stays on its own is purely a compositor decision driven by `mainWin.show()` + `mainWin.focus()`. **Linux/Wayland behavior here is not part of the upstream feature spec.**
- **Restore from minimized.** No `restore()` call in the submit path. `show()` un-minimizes on most WMs; whether it does on a given Wayland compositor is up to that compositor.
- **Multi-monitor placement on cursor / focused display.** Upstream uses last-saved position or primary display, never "where the user is right now."
- **Multi-window targeting.** All `show`/`focus` calls go through `ut` (the main window). If the user has multiple windows, behavior is undefined.
- **Popup re-creation if its `BrowserWindow` is destroyed.** Upstream does not re-construct `Ko` after destroy — it's only created on first shortcut press.
- **Compositor-aware behavior.** Upstream has no concept of "GNOME vs KDE vs wlroots." Anywhere our patches branch on `XDG_CURRENT_DESKTOP`, that's our project compensating for compositor-specific Electron breakage, not implementing an upstream-defined contract.
### Edge case: fullscreen main window
`:525287-525290` reads (paraphrased): *"if `ut` exists and `ut.isFullScreen()` is true, focus `ut` and call `ide()`; else show the Quick Entry popup."* So if the main window is fullscreen when the shortcut fires, **the popup does not appear** — the shortcut focuses the main window instead. QE-1 needs this caveat.
### Edge case: `h1()` is a *don't-show-if-already-focused* optimization
The visibility-check function (`h1()` at `:105164-105171`) is upstream's mechanism for "don't redundantly call `show()` if the main window is already focused." Sound design. The reason it's broken on Linux is Electron's `BrowserWindow.isFocused()` returning stale-true after `hide()` on Linux backends — i.e., **the patch we apply is fixing a Linux-Electron bug, not diverging from upstream intent.** Once `isFocused()` returns honest values on Linux, the patch could be retired.
## Test list
Each item is a single check. Severity tier matches the existing scaffolding (Critical / Should / Smoke). Existing test ID in parentheses — `(new)` means this item should be added to [`cases/shortcuts-and-input.md`](./cases/shortcuts-and-input.md) before this sweep is reproducible by anyone else.
### Shortcut activation — covers #404
| ID | Severity | Step | Expected | Existing |
|----|----------|------|----------|----------|
| QE-1 | Smoke | App focused (not fullscreen), press shortcut | Popup appears. **Edge case from upstream design:** if main window is fullscreen, the shortcut focuses main and runs `ide()` instead of showing the popup (`:525287-525290`). Test this fullscreen variant separately as QE-1b — popup should *not* appear. | [S34](./cases/shortcuts-and-input.md#s34--quick-entry-shortcut-focuses-fullscreen-main-window-instead-of-showing-popup) (QE-1b only) |
| QE-2 | Critical | Other app focused, press shortcut | Popup appears | [T06](./cases/shortcuts-and-input.md#t06--quick-entry-global-shortcut-unfocused), [S11](./cases/shortcuts-and-input.md#s11--quick-entry-shortcut-fires-from-any-focus-on-wayland-mutter-xwayland-key-grab) |
| QE-3 | Critical | App on a different workspace, press shortcut | Popup appears on current workspace | [T06](./cases/shortcuts-and-input.md#t06--quick-entry-global-shortcut-unfocused) |
| QE-4 | Critical | App closed-to-tray (no window mapped), press shortcut | Popup appears | [S29](./cases/shortcuts-and-input.md#s29--quick-entry-popup-is-created-lazily-on-first-shortcut-press-closed-to-tray-sanity) |
| QE-5 | Should | App quit entirely, press shortcut | No popup, no error, no zombie process | [S30](./cases/shortcuts-and-input.md#s30--quick-entry-shortcut-becomes-a-no-op-after-full-app-exit) |
| QE-6 | Should | Inspect Electron argv via `cat /proc/$(pgrep -f 'app\.asar')/cmdline \| tr '\0' ' '` (the launcher script also matches `claude-desktop`, so anchor on `app.asar` to hit the Electron process). Cross-check launcher log line `Using X11 backend via XWayland (for global hotkey support)` vs `Using native Wayland backend (global hotkeys may not work)` (verbatim from `scripts/launcher-common.sh:98, 102`). | **Pre-S12 fix:** flag absent; shortcut fails on GNOME Wayland (this is the #404 repro). **Post-S12 fix:** `--enable-features=GlobalShortcutsPortal` present in argv on GNOME Wayland; QE-2 / QE-3 begin to pass. | [S12](./cases/shortcuts-and-input.md#s12----enable-featuresglobalshortcutsportal-launcher-flag-wired-up-for-gnome-wayland) |
### Submit → main window — covers #393
| ID | Severity | Step | Expected | Existing |
|----|----------|------|----------|----------|
| QE-7 | Smoke | Main window visible, submit prompt from QE | Popup closes; main window navigates to a **new** chat session (not appended to current chat — `ynt(e)` at `:515546` always creates new). | [S31](./cases/shortcuts-and-input.md#s31--quick-entry-submit-makes-the-new-chat-reachable-from-any-main-window-state) |
| QE-8 | Critical | Main window minimized, submit | **Upstream calls `show() + focus()` only — no `restore()`.** Whether the WM un-minimizes is compositor-dependent. Test as black-box: record whether the new chat is reachable to the user (window comes back to view, OR user has to click tray/dock to see it). Both outcomes are upstream-acceptable; only "new chat created but unreachable" is a regression. | [S31](./cases/shortcuts-and-input.md#s31--quick-entry-submit-makes-the-new-chat-reachable-from-any-main-window-state) |
| QE-9 | Critical | Main window hidden-to-tray (after [T08](./cases/tray-and-window-chrome.md#t08--hide-to-tray-on-close)), submit | Same as QE-8 — `show()` should re-map a hidden window on most compositors, but upstream doesn't guarantee it. The new chat must be reachable; the path to reach it (auto vs tray-click) is compositor-dependent. | [S31](./cases/shortcuts-and-input.md#s31--quick-entry-submit-makes-the-new-chat-reachable-from-any-main-window-state) |
| QE-10 | Should | Main window on different workspace, submit | **Upstream has no workspace logic** (no `setVisibleOnAllWorkspaces`, no `moveTop`). Outcome is whatever the compositor decides on `show()` + `focus()`. Record observed behavior per row; do not treat any single outcome as the "right" one. | [S31](./cases/shortcuts-and-input.md#s31--quick-entry-submit-makes-the-new-chat-reachable-from-any-main-window-state) |
| QE-11 | Critical | **GNOME-specific (Andrej730 repro):** App in tray, *not* present in Dash/dock, submit | Main window opens. The codebase doesn't reason about Dash presence — this is purely a compositor-observed state. The underlying failure is `BrowserWindow.isFocused()` returning stale-true on GNOME mutter, which causes the patched (KDE) code path's `h1() || ut.show()` chain to short-circuit before `show()`. Test as a black-box repro. | [S32](./cases/shortcuts-and-input.md#s32--quick-entry-submit-on-gnome-mutter-doesnt-trip-electron-stale-isfocused) |
| QE-12 | Should | App in tray, *also* present in Dash/dock, submit | Main window opens (this state should not trip the stale-focus bug, but verify) | [S32](./cases/shortcuts-and-input.md#s32--quick-entry-submit-on-gnome-mutter-doesnt-trip-electron-stale-isfocused) |
| QE-13 | Smoke | Submit prompt with 1-2 chars (`hi`) | Upstream silently drops. The actual gate is `> 2` chars at `index.js:515530, 515533` — anything 3+ submits. So `hi` (2) drops, `hel` (3) submits. Document, do not fix. | — |
### Visual / window appearance — covers #370
| ID | Severity | Step | Expected | Existing |
|----|----------|------|----------|----------|
| QE-14 | Should | Inspect popup background | Transparent; no opaque square frame visible behind the rounded UI. **Note:** upstream already sets `transparent: true` and `backgroundColor: "#00000000"` (`:515380, :515383`), so the #370 triage-bot suggestion to "try setting backgroundColor to transparent" is moot — those are already in place. The Electron 41.0.4 regression is at the CSD/shadow rendering layer below those flags, not at the option-passing layer. | [S10](./cases/shortcuts-and-input.md#s10--quick-entry-popup-is-transparent-no-opaque-square-frame) |
| QE-15 | Smoke | Inspect popup chrome | No titlebar, no close/min/max buttons (frameless) | [`ui/quick-entry.md`](./ui/quick-entry.md) |
| QE-16 | Smoke | Inspect popup edges | Drop shadow + rounded corners render (compositor-dependent — note where missing) | [`ui/quick-entry.md`](./ui/quick-entry.md) |
| QE-17 | Smoke | Open popup, then click on another window | Popup stays above (always-on-top) | [`ui/quick-entry.md`](./ui/quick-entry.md) |
| QE-18 | Should | `electron --version` against the running app's bundled binary; record version in matrix | When > 41.0.4 ships and #370 still reproduces, the upstream-regression hypothesis is wrong | [S33](./cases/shortcuts-and-input.md#s33--quick-entry-transparent-rendering-tracked-against-bundled-electron-version) |
### Patch-application sanity — regression prevention
| ID | Severity | Step | Expected | Existing |
|----|----------|------|----------|----------|
| QE-19 | Critical | **All rows.** Extract the installed `app.asar` (`npx asar extract /usr/lib/claude-desktop/app.asar /tmp/inspect-installed`) and grep the bundled JS for the KDE gate string injected by the patch: `grep -c 'XDG_CURRENT_DESKTOP' /tmp/inspect-installed/.vite/build/index.js`. The patch (`scripts/patches/quick-window.sh:34-35, 117-118`) injects `(process.env.XDG_CURRENT_DESKTOP\|\|"").toLowerCase().includes("kde")` — that string is the runtime fingerprint. Note: the `Patched quick window` / `WARNING: No quick entry show() calls patched` lines from the patch are **build-time stdout** (not in `launcher.log`); check the build log if you built locally. | Bundled JS contains the KDE gate string (patch ran at build time). The patch ships in every build; the KDE-vs-non-KDE branch is decided at runtime by the env-var check. **Runtime gate effectiveness is verified implicitly by QE-7 through QE-12 passing on KDE and the unpatched-equivalent path running on non-KDE.** | [S09](./cases/shortcuts-and-input.md#s09--quick-window-patch-runs-only-on-kde-post-406-gate) |
### Input behavior smoke — catches collateral breakage
| ID | Severity | Step | Expected | Existing |
|----|----------|------|----------|----------|
| QE-21 | Smoke | In popup: `Esc` dismisses; click-outside dismisses; `Shift+Enter` inserts newline; `Enter` submits | All four behave as labelled. **Implementation notes for diagnostics:** click-outside is wired in the **main process** via the popup's `blur` handler (`:515465`). `Esc` / `Enter` / `Shift+Enter` are **renderer-side** (not visible in `index.js`); they go through IPC to `requestDismiss()` (`:515409`) and `requestDismissWithPayload()`. If a dismiss key fails, isolate which side is broken before reporting. | [`ui/quick-entry.md`](./ui/quick-entry.md) |
### Popup placement & lifecycle — upstream contract sanity
These verify upstream-promised behaviors that aren't directly broken by #393/#404/#370 but live in the same surface area. Failures here would indicate a separate regression — file a new issue rather than folding it into the close-out trio.
| ID | Severity | Step | Expected | Existing |
|----|----------|------|----------|----------|
| QE-22 | Should | Invoke Quick Entry. Note popup position. Dismiss (Esc). Quit Claude Desktop entirely (`pkill -f app.asar` after closing the main window, or via tray → Quit). Re-launch. Invoke Quick Entry. | Popup reappears at the same monitor + position as before the restart. Upstream persists position via `an.get("quickWindowPosition")` (`:515491-515526`), keyed on monitor label + resolution. Position must survive a full app restart, not just dismiss/re-invoke. | [S35](./cases/shortcuts-and-input.md#s35--quick-entry-popup-position-is-persisted-across-invocations-and-across-app-restarts) |
| QE-23 | Smoke | **Multi-monitor required.** With an external monitor connected, invoke Quick Entry on the external monitor — let the position be saved (trigger QE-22's persistence path). Disconnect the external monitor (libvirt: `virsh detach-device` for the second display, or unplug the host monitor passing through). Invoke Quick Entry. | Popup falls back to the primary display via `cHn()` (`:515502`). Does **not** appear at off-screen coordinates. Skip this row in single-monitor VMs. | [S36](./cases/shortcuts-and-input.md#s36--quick-entry-popup-falls-back-to-primary-display-when-saved-monitor-is-gone) |
| QE-24 | Should | Launch app, focus main window, then **destroy** the main window without quitting the app. On this project the X button hide-to-tray override means the standard close path won't destroy `ut`; force the destroy via a) DevTools console (`Cmd+Opt+I` / `Ctrl+Shift+I``require('electron').remote.getCurrentWindow().destroy()` if exposed), or b) accept that this case is unreachable on Linux without a code change and skip. After destroy, invoke Quick Entry, type, submit. | Popup remains functional (lazy-recreation on shortcut press; the `!ut \|\| ut.isDestroyed()` guard at `:515595` skips the show/focus block but does not crash). New chat creation may not have a window to surface in — if app remains running with no main window, this is the "popup outlives main" path upstream guarantees. **If unreachable on Linux, mark this row N/A and document why.** | [S37](./cases/shortcuts-and-input.md#s37--quick-entry-popup-remains-functional-after-main-window-destroy) |
## Mandatory matrix
The five rows below are the must-pass set to close all three issues. Display server is the **session selected at login** — KDE and GNOME both let you choose Wayland vs Xorg from the greeter.
| Row | Distro | DE | Display server | Closes / verifies | Reporter |
|-----|--------|----|--------------:|-------------------|----------|
| **GNOME-W** | Fedora 43 Workstation | GNOME 49.x | Wayland | #404 (S11/S12), #393 (QE-11/QE-12) | @gianluca-peri (#404), @Andrej730 (#393 root cause) |
| **Ubu-W** | Ubuntu 24.04 LTS | GNOME (Ubuntu) | Wayland | #393 close-out (post-#406 gate). Also catches the `XDG_CURRENT_DESKTOP=ubuntu:GNOME` quirk (S02) | @Andrej730 |
| **KDE-W** | Fedora 43 KDE *or* Nobara 43 KDE | Plasma 6 | Wayland | #370 (S10), QE-19 patch sanity, daily-driver regression baseline | @noctuum (#370), aaddrick |
| **GNOME-X** | Ubuntu 24.04 (GNOME on Xorg session at greeter) | GNOME | Xorg | Differentiates whether #404 is mutter-as-compositor or mutter-XWayland-grabs specifically. **Note:** Fedora 43 GNOME may not ship an X11 session anymore (GNOME 49 deprecation); use Ubuntu's GNOME-on-Xorg session instead. | — |
| **KDE-X** | Fedora 43 KDE (Plasma X11 session at greeter) | Plasma 6 | Xorg | Catches kwin-X11 specifics; regression baseline for the historic working path | — |
## Strongly recommended
Catches generalization gaps but not blocking close-out.
| Row | Distro | DE | Display server | Why |
|-----|--------|----|--------------:|------|
| **COSMIC** | popOS 24.04 (COSMIC alpha) | COSMIC | Wayland | @davidsmorais reported #393 there; not covered by KDE or GNOME branches |
| **Ubu-X** | Ubuntu 24.04 (GNOME on Xorg) | GNOME | Xorg | Already counted under GNOME-X above. Listed here too because the Ubuntu install base is large — counts as its own row in the dashboard |
## Optional
Tracked under different bugs ([S06](./cases/shortcuts-and-input.md#s06--url-handler-doesnt-segfault-on-native-wayland), [S14](./cases/shortcuts-and-input.md#s14--global-shortcuts-via-xdg-portal-work-on-niri)) — skip unless closing those in the same sweep.
| Row | DE | Tracked under |
|-----|----|--------------:|
| Sway | wlroots | S06 |
| Niri | wlroots | S14 |
| Hypr-N (Omarchy) | wlroots | per @typedrat |
| Hypr-O | Hyprland Xorg | per @typedrat |
| i3 | Xorg | matrix |
## VM inventory
Existing host: `~/vms/` (libvirt, qcow2 images on a separate root-owned dir). Per-VM creation scripts in `~/vms/scripts/`. Per-VM test protocol in [`~/vms/README.md`](file:///home/aaddrick/vms/README.md).
### Have
| Row | VM image | Status |
|-----|----------|--------|
| GNOME-W | `claude-fedora43-gnome.qcow2` | Ready |
| Ubu-W | `claude-ubuntu-2404.qcow2` | Ready |
| KDE-W | `claude-fedora43-kde.qcow2` | Ready (Nobara KDE on the bare-metal host is the alternative) |
| GNOME-X | `claude-ubuntu-2404.qcow2` | Ready (use the GNOME-on-Xorg session at the greeter — same VM as Ubu-W) |
| KDE-X | `claude-fedora43-kde.qcow2` | Ready (use the Plasma X11 session at the greeter — same VM as KDE-W) |
### Need to add for full mandatory + recommended coverage
| Row | What | Why |
|-----|------|-----|
| **COSMIC** | popOS 24.04 (COSMIC alpha) ISO + `~/vms/scripts/create-popos-cosmic.sh` | Davidsmorais's #393 environment; otherwise unrepresented |
### Need to add only if closing optional rows in the same sweep
| Row | What | Use existing | Why |
|-----|------|--------------|-----|
| Niri | Fedora-Niri-Live ISO + `~/vms/scripts/create-fedora-niri.sh` | — | S14 (`BindShortcuts` error 5) |
| Hypr-N | Possibly already covered by `claude-omarchy` | `claude-omarchy.qcow2` | Omarchy is a Hypr-N variant; may not exercise stock Hyprland |
| Sway | `claude-fedora43-sway.qcow2` | Existing | S06 URL handler segfault |
| i3 | `claude-fedora43-i3.qcow2` | Existing | Coverage only |
## Minimum viable kill-set
If the goal is the smallest pass that justifies closing all three issues:
- **GNOME-W** — must pass QE-2/3/4/6/7/8/9/11 → closes #404, half of #393.
- **Ubu-W** — must pass QE-7/8/9/11 → closes other half of #393.
- **KDE-W** — must pass QE-7/8/9 + QE-14 + QE-19 → closes #370 (or punts upstream with QE-18 evidence) and confirms the gated patch path still works.
(QE-20 has been folded into QE-19 — the patch ships in every build, so a single bundled-JS check covers both KDE and non-KDE rows.)
Three VMs, ~21 items per row, one full sweep ≈ 90 minutes if the visual checks are batched.
## Per-row pass criteria
| Issue | Closeable when |
|-------|----------------|
| #393 | QE-7 through QE-12 pass on **GNOME-W**, **Ubu-W**, and **KDE-W**. QE-19 confirms the patch was applied at build (KDE gate string present). If QE-11 fails on GNOME-W, the KDE-only gate is preserved as a permanent fix; otherwise the patch can be widened. |
| #404 | QE-2 and QE-3 pass on **GNOME-W**. QE-6 confirms the launcher actually appended `--enable-features=GlobalShortcutsPortal` on GNOME Wayland (S12). |
| #370 | QE-14 passes on **KDE-W**. **OR** QE-18 records an Electron version > 41.0.4 in the bundled binary and QE-14 still fails — at that point the upstream-regression hypothesis is wrong and we re-investigate. |
## Scaffold integration
This sweep is fully wired into the existing test scaffold. The `QE-*` items in [§ Test list](#test-list) map onto formal `S##` test cases in [`cases/shortcuts-and-input.md`](./cases/shortcuts-and-input.md):
| Case | Title | Backs |
|------|-------|-------|
| [S29](./cases/shortcuts-and-input.md#s29--quick-entry-popup-is-created-lazily-on-first-shortcut-press-closed-to-tray-sanity) | Popup created lazily on first shortcut press (closed-to-tray sanity) | QE-4 |
| [S30](./cases/shortcuts-and-input.md#s30--quick-entry-shortcut-becomes-a-no-op-after-full-app-exit) | Shortcut becomes no-op after full app exit | QE-5 |
| [S31](./cases/shortcuts-and-input.md#s31--quick-entry-submit-makes-the-new-chat-reachable-from-any-main-window-state) | Submit makes the new chat reachable from any main-window state | QE-7 through QE-10 |
| [S32](./cases/shortcuts-and-input.md#s32--quick-entry-submit-on-gnome-mutter-doesnt-trip-electron-stale-isfocused) | Submit on GNOME mutter doesn't trip Electron stale-`isFocused()` | QE-11, QE-12 |
| [S33](./cases/shortcuts-and-input.md#s33--quick-entry-transparent-rendering-tracked-against-bundled-electron-version) | Transparent rendering tracked against bundled Electron version | QE-18 |
| [S34](./cases/shortcuts-and-input.md#s34--quick-entry-shortcut-focuses-fullscreen-main-window-instead-of-showing-popup) | Shortcut focuses fullscreen main instead of showing popup | QE-1b |
| [S35](./cases/shortcuts-and-input.md#s35--quick-entry-popup-position-is-persisted-across-invocations-and-across-app-restarts) | Popup position persisted across invocations and across app restarts | QE-22 |
| [S36](./cases/shortcuts-and-input.md#s36--quick-entry-popup-falls-back-to-primary-display-when-saved-monitor-is-gone) | Popup falls back to primary display when saved monitor is gone | QE-23 |
| [S37](./cases/shortcuts-and-input.md#s37--quick-entry-popup-remains-functional-after-main-window-destroy) | Popup remains functional after main window destroy | QE-24 |
UI-element-level checks for QE-14 through QE-17 and QE-21 live in [`ui/quick-entry.md`](./ui/quick-entry.md), which has been refined against the upstream evidence captured in [§ Upstream design intent](#upstream-design-intent).
(QE-13, QE-21 don't need their own S-IDs — they're documentation items / already covered by `ui/quick-entry.md`.)
## Sweep mechanics
Per-row procedure (one full pass):
1. Boot VM. Confirm session at greeter matches the row (Wayland vs Xorg, correct DE).
2. Install the latest build:
- DEB: `sudo apt install ./claude-desktop_*.deb`
- RPM: `sudo dnf install ./claude-desktop-*.rpm`
3. Capture environment baseline: `XDG_SESSION_TYPE`, `XDG_CURRENT_DESKTOP`, `gnome-shell --version` or `kwin --version`, `electron --version` (for QE-18).
4. Launch app. Wait for main window. Run QE-21 input smoke first to catch obvious breakage early.
5. Run shortcut tests (QE-1 → QE-6) in order. Each run, scrape `~/.cache/claude-desktop-debian/launcher.log` and `pgrep -af claude-desktop` argv.
6. Run submit tests (QE-7 → QE-13). For each window-state precondition, set the state, then trigger Quick Entry, then submit.
7. Run visual checks (QE-14 → QE-18). Screenshot QE-14 to attach to #370 if still failing.
8. Run patch sanity (QE-19 / QE-20).
9. Update [`matrix.md`](./matrix.md) status cells. Save logs under a row-tagged subdirectory: `~/vms/collected/<row>-<date>/`.
For the deeper #393 bisect (isolating which half of PR #390 regresses GNOME), see the two-variant build instructions in [`~/vms/README.md`](file:///home/aaddrick/vms/README.md) — build a blur-only and a vis-only variant, run QE-7 through QE-11 on each on **Ubu-W** and **GNOME-W**, gate the offending half rather than the whole patch.

343
docs/testing/runbook.md Normal file
View File

@@ -0,0 +1,343 @@
# Testing Runbook
*Last updated: 2026-05-03*
How to run a test sweep, capture diagnostics, file failures, and update [`matrix.md`](./matrix.md). For the test specs themselves, see [`cases/`](./cases/) and [`ui/`](./ui/). For the automation harness, see [`automation.md`](./automation.md) and [`tools/test-harness/`](../../tools/test-harness/). For the grounding sweep workflow (verify case docs against the live build), see [Grounding sweep](#grounding-sweep) below.
## When to sweep
| Trigger | Scope | Rows |
|---------|-------|------|
| Release tag (`vX.Y.Z+claude...`) | Smoke set | KDE-W + Hypr-N (or Sway) |
| Release tag, monthly | Smoke + Critical | All active rows |
| Upstream Claude Desktop bump | Smoke set + [grounding sweep](#grounding-sweep) | KDE-W + one wlroots row |
| PR touching `scripts/patches/*.sh` | Tests in the affected surface (use surface tags in cases files) | KDE-W minimum |
| Bug report citing an env | The relevant test on the reporter's row | Just that row |
## Setup: VM matrix
Each non-host row in [`matrix.md`](./matrix.md) is a QEMU/KVM guest. Standard config:
- 4 GB RAM, 2 vCPU minimum
- virtio-gpu **with** `gl=on` (3D acceleration). On hybrid GPU hosts, pin `rendernode=/dev/dri/renderD129` (AMD); avoid renderD128 (NVIDIA, EGL init fails on aaddrick's laptop)
- 32 GB qcow2 disk
- Bridged networking
- Virgil 3D enabled where possible (helps WebGL detection in T12)
ISOs / images per row:
| Row | Source |
|-----|--------|
| Fedora 43 (KDE-W, KDE-X, GNOME, Sway, i3, Niri) | https://fedoraproject.org/spins/ for KDE/GNOME, https://fedoraproject.org/sericea/ for Sway, manual install for i3/Niri |
| Ubuntu 24.04 (Ubu) | https://ubuntu.com/download/desktop |
| OmarchyOS (Hypr-O) | https://omarchy.org |
| NixOS (Hypr-N) | https://nixos.org/download with Hyprland module |
For the host (KDE-W), test against Nobara directly — no VM needed.
## Setup: building the install candidate
```bash
# Build from the branch under test
./build.sh --build appimage --clean no
./build.sh --build deb --clean no
./build.sh --build rpm --clean no
# Or pull from CI artifacts for a tagged release
gh run download <RUN_ID> -n claude-desktop-deb-amd64
gh run download <RUN_ID> -n claude-desktop-rpm-amd64
gh run download <RUN_ID> -n claude-desktop-appimage-amd64
```
Drop the resulting `.deb` / `.rpm` / `.AppImage` into a shared folder mounted into each guest, or `scp` per-guest.
## Running a sweep: the standard loop
For each test in scope:
1. **Read the test spec** in `cases/<surface>.md` (or `ui/<surface>.md` for UI checklists). Note the `Severity`, `Steps`, and `Expected` sections.
2. **Execute the steps** as described.
3. **Compare against Expected.** Mark internally as `✓`, `✗`, `🔧`, or `?` (untested if you couldn't run it for env reasons; `-` if N/A).
4. **On `✗`**: capture the diagnostics from the test's `Diagnostics on failure` block (see [diagnostic capture](#diagnostic-capture) below). File an issue if one isn't already linked.
5. **Update [`matrix.md`](./matrix.md)** in a single PR per row per sweep, titled `test: <ROW> sweep YYYY-MM-DD`.
## Diagnostic capture
Standard captures referenced from test `Diagnostics on failure` blocks:
### `--doctor` output
```bash
claude-desktop --doctor 2>&1 | tee /tmp/doctor.txt
```
Or for AppImage:
```bash
./claude-desktop-*.AppImage --doctor 2>&1 | tee /tmp/doctor.txt
```
### Launcher log
```bash
cat ~/.cache/claude-desktop-debian/launcher.log
```
Truncate and re-run if the file is stale:
```bash
: > ~/.cache/claude-desktop-debian/launcher.log
claude-desktop 2>&1 | tee -a ~/.cache/claude-desktop-debian/launcher.log
```
### Session env
```bash
echo "XDG_SESSION_TYPE=$XDG_SESSION_TYPE"
echo "XDG_CURRENT_DESKTOP=$XDG_CURRENT_DESKTOP"
echo "WAYLAND_DISPLAY=$WAYLAND_DISPLAY"
echo "DISPLAY=$DISPLAY"
echo "GDK_BACKEND=$GDK_BACKEND"
echo "QT_QPA_PLATFORM=$QT_QPA_PLATFORM"
echo "OZONE_PLATFORM=$OZONE_PLATFORM"
echo "ELECTRON_OZONE_PLATFORM_HINT=$ELECTRON_OZONE_PLATFORM_HINT"
```
### Tray / DBus state (KDE)
```bash
# List registered tray icons
gdbus call --session --dest=org.kde.StatusNotifierWatcher \
--object-path=/StatusNotifierWatcher \
--method=org.freedesktop.DBus.Properties.Get \
org.kde.StatusNotifierWatcher RegisteredStatusNotifierItems
# Find which process owns a connection
gdbus call --session --dest=org.freedesktop.DBus \
--object-path=/org/freedesktop/DBus \
--method=org.freedesktop.DBus.GetConnectionUnixProcessID ":1.XXXX"
```
### Portal availability (Wayland)
```bash
systemctl --user status xdg-desktop-portal
busctl --user tree org.freedesktop.portal.Desktop
```
### Suspend inhibitors
```bash
systemd-inhibit --list
```
### App version
```bash
claude-desktop --version
gh variable get CLAUDE_DESKTOP_VERSION
gh variable get REPO_VERSION
```
Always include the upstream version + project version in the issue body and the matrix-update commit message.
## Filing failures
Issue title format: `[<row>] <T## or S##>: <one-line symptom>`
Issue body template:
```markdown
**Test:** [T17 — Folder picker opens](./docs/testing/cases/code-tab-foundations.md#t17--folder-picker-opens)
**Environment:** GNOME (Fedora 43, Wayland)
**Project version:** v1.3.23+claude1.4758.0
**Upstream version:** 1.4758.0
## Steps
<paste from test spec>
## Expected
<paste from test spec>
## Actual
<observed behavior>
## Diagnostics
<--doctor output, launcher log, session env, anything else from the test's Diagnostics block>
## Notes
<any hypotheses, related PRs, recent regressions>
```
Link the issue back into [`matrix.md`](./matrix.md) on the affected cell using the standard format: `✗ #NNN`.
## Updating the matrix
One PR per sweep per row. Bundle every status change for that row into a single commit so the matrix history reads as a sequence of sweep events, not individual cell flips.
Commit message template:
```
test(<row>): sweep <YYYY-MM-DD> — <project_version>+claude<upstream_version>
- T01 ? → ✓
- T03 ? → ✓
- T05 ? → ✗ (filed #NNN)
- T17 ? → ✓
- ...
```
If the same sweep also turned up new tests worth adding, those go in a separate commit before the status update so the diff stays focused.
## Severity guidance for new tests
When adding a test to `cases/` or `ui/`, pick severity using these heuristics:
| Tier | Pick when | Example |
|------|-----------|---------|
| Smoke | First-launch experience; if this fails the app is unusable for normal users | T01 (app launch), T03 (tray), T16 (Code tab loads) |
| Critical | Feature is documented in upstream docs **and** breaks core workflows when broken | T22 (PR monitoring), T34 (connector OAuth), T17 (folder picker) |
| Should | Quality-of-life or documented edge case; users hit it but have a workaround | T28 (catch-up after suspend), S26 (auto-update vs apt) |
| Could | Niche, env-specific, or graceful-degradation checks | T39 (`/desktop` CLI N/A), S22 (computer-use toggle absent on Linux) |
When in doubt, file as **Should**. Smoke and Critical mean release gates — be conservative about adding gates.
## Adding a new test
1. Pick the right surface file in `cases/` (or create one with prior buy-in if no existing surface fits — don't sprinkle new files lightly).
2. Use the next free ID: highest `T##` + 1 for cross-env, highest `S##` + 1 for env-specific. Don't reuse retired IDs.
3. Follow the standard structure: `**Severity:**`, `**Surface:**`, `**Applies to:**`, `**Steps:**`, `**Expected:**`, `**Diagnostics on failure:**`, `**References:**`.
4. Add the row to [`matrix.md`](./matrix.md) with all-`?` initial state.
5. Mention the new test in the PR description so reviewers know to read the spec.
For UI checklist additions, append rows to the relevant `ui/<surface>.md` table. UI rows don't need `T##` / `S##` IDs — the surface file + element name is the identity.
## Automated runs
The harness at [`tools/test-harness/`](../../tools/test-harness/) drives any
test with a `runner:` field. As of 2026-04-30, that's T01, T03, T04, T17.
### Invoking a sweep
```sh
cd tools/test-harness
npm install # first time only
ROW=KDE-W ./orchestrator/sweep.sh
```
Output:
- `results/results-${ROW}-${DATE}/junit.xml` — the JUnit summary (one
testsuite per `.spec.ts` file, with the test's annotations preserved as
metadata).
- `results/results-${ROW}-${DATE}/test-output/<test>/` — per-test
attachments (screenshots, launcher log, session env, frame extents,
click-attempt diagnostics, etc.). Captured on every run, not just on
failure (Decision 7).
- `results/results-${ROW}-${DATE}/html/` — Playwright's HTML report.
- `results/results-${ROW}-${DATE}.tar.zst` — bundled artifact for
off-machine inspection (when `zstd` is available).
`sweep.sh` prints a summary line at the end:
```
summary: tests=4 failures=0 errors=0 skipped=1
```
### Translating results to the matrix
JUnit `<failure>``✗`, `<error>` (harness broke) → `?`, `<skipped>`
`-` (when intentionally not applicable) or stays `?` (when the test
couldn't reach an assertion — common case for renderer tests that need
sign-in or selectors that haven't been tuned). For now this mapping is
manual: open `junit.xml`, update `matrix.md` cells, commit. A
`render-matrix.sh` to do this automatically is on the to-do list.
### Coexistence with manual tests
Tests without a `runner:` continue to flow through the manual loop above.
The matrix doesn't distinguish automated from manual cells — a `✓` is a
`✓` regardless of how it was produced. The `runner:` field on each case
makes the source-of-truth explicit per-test.
### Path through the CDP auth gate (why this works)
The shipped Electron exits if `--remote-debugging-port` is on argv
without a valid `CLAUDE_CDP_AUTH` token. Both `_electron.launch()` and
`chromium.connectOverCDP()` inject that flag. The harness sidesteps the
gate by spawning Electron clean and attaching the Node inspector via
`SIGUSR1` at runtime — same code path as `Developer → Enable Main
Process Debugger`. From there, main-process JS evaluation reaches the
renderer through `webContents.executeJavaScript()`. Full writeup:
[`automation.md`](./automation.md#the-cdp-auth-gate-and-the-runtime-attach-workaround-that-beats-it).
### Wayland-mode sweep
Default backend is X11-via-XWayland (matches `launcher-common.sh`'s
default). To sweep the suite under native Wayland, set
`CLAUDE_HARNESS_USE_WAYLAND=1`:
```sh
CLAUDE_HARNESS_USE_WAYLAND=1 ROW=KDE-W ./orchestrator/sweep.sh
```
Every `launchClaude()` swaps to the Wayland flag set
(`--ozone-platform=wayland` + WaylandWindowDecorations / IME / text-
input-version=3, mirroring `scripts/launcher-common.sh:132-139`) and
exports `CLAUDE_USE_WAYLAND=1` + `GDK_BACKEND=wayland` into the spawn
env. Per-launch overrides via `launchClaude({ extraEnv })` still win,
so a single test can opt back to X11 inside a Wayland-mode sweep.
Caveat: T04 (`_NET_FRAME_EXTENTS` xprop check) only works under
XWayland — native-Wayland sessions have no X11 client list, so T04
will skip with a "no X11 client list" diagnostic.
## Grounding sweep
Separate from the test sweep. Where the test sweep verifies *upstream
Linux compat behavior* against case specs, the grounding sweep
verifies *the specs themselves* against upstream behavior — making
sure the Steps and Expected fields haven't bit-rotted past what the
shipped build actually does. Run on every upstream `CLAUDE_DESKTOP_VERSION`
bump.
### Static pass
For each file under [`cases/`](./cases/), confirm every test's
`**Code anchors:**` field still resolves and the Steps/Expected match
behavior. The convention is documented in
[`cases/README.md`](./cases/README.md#anchor-scope) — anchors are
either upstream code (`build-reference/app-extracted/.vite/build/`),
wrapper scripts (`scripts/`), v7 walker inventory, or out-of-scope
(CLI binary, server-rendered SPA).
When a test drifts, edit Steps/Expected in place. When a feature is
gone from the build, prepend
`> **⚠ Missing in build X.Y.Z** — <note>. Re-verify after next
upstream bump.` under the test heading.
[`cases-grounding-prompt.md`](./cases-grounding-prompt.md) is the
fan-out prompt the last sweep used — paste verbatim into a fresh
session to repeat the workflow.
### Runtime pass
Run [`tools/test-harness/grounding-probe.ts`](../../tools/test-harness/grounding-probe.ts)
against the live build:
```sh
cd tools/test-harness
npm run grounding-probe -- --launch --include-synthetic \
--out ../../docs/testing/cases-grounding-runtime.json
```
Captures runtime state for tests where static greps can't disambiguate
(IPC handler registry, `globalShortcut.isRegistered()` for known
accelerators, `app.getLoginItemSettings()`, `safeStorage`,
`autoUpdater.getFeedURL()`, SNI tray registration, AX-tree fingerprint
of whatever's on screen). Output is keyed by test ID — diff against
the previous version's capture to spot drift the static pass missed.
Surfaces inside modals or popups (T22 PR toolbar, T26 preset list,
T31 side chat, T32 slash menu) need the surface open at probe time.
Open the relevant view in the running app before re-running with
`--port 9229` (attach mode).

View File

@@ -0,0 +1,238 @@
# test-harness runner implementation — session 17 prompt
This file is meant to be **copied verbatim into a fresh Claude Code
session** as the initial user message. Don't paraphrase it; the
orchestration depends on the exact directives below.
> **ORCHESTRATION STOPPED AFTER SESSION 16.** This prompt is rotated
> for completeness only. **Session 17 will NOT run automatically** —
> the autonomous orchestration was halted at the end of session 16
> after coverage stalled at 74/76 (97%) for four consecutive sessions
> (13, 14, 15, 16). To resume, the user must manually trigger another
> orchestration run AND meet at least one of these preconditions:
>
> 1. **Real signed-in Claude Desktop running with `--inspect=9229`**
> on the dev box (debugger-attached, signed in, NOT a leaked test
> isolation). This unblocks Categories A (operon-mode probe) and
> B (Tier 3 read-only reframes that need auth-bearing renderer
> state).
> 2. **A real claude.ai account fixture for write-side state.** The
> remaining 2 specs (matrix coverage 74/76 → 76/76) need real
> write-side state (e.g. an installed plugin to exercise
> `LocalPlugins.listSkillFiles`, or a deep-linked deferred install
> intent for T11). The Tier 3 destructive constraint
> (`Don't run destructive Tier 3 write-side tests`) explicitly
> forbids the harness constructing this state itself.
> 3. **Renderer-drift event** that requires re-anchoring page-objects
> (e.g. claude.ai redesign breaks `findCompactPills`,
> `clickMenuItem`, etc.). Triggers a defensive-migration session.
> 4. **New IPC surface** added by upstream that the harness should
> cover (e.g. a new `claude.web` interface, a new eipc method
> that's case-doc-anchored).
>
> If none of those preconditions hold, the orchestration should NOT
> resume — further sessions will produce documentation-only or
> marginal output. The structural ceiling of the harness without
> real-account fixtures is 74/76 (97%); we're already there.
You're picking up after session 16 of the test-harness runner
implementation work. Session 16 was the final session of the
sessions-13-to-16 orchestration run and produced: T17 verification
(session-15 structural fix VERIFIED — bare 60s timeout gone, new
failure mode at `openFolderPicker` post-`selectLocal` classified as
renderer-state-dependent and deferred), schema-rev for
`listRemotePluginsPage` / `listSkillFiles` (both schemas resolved by
bundle inspection — neither shipped as a Tier 2 invocation because
`listRemotePluginsPage` is not anchored in any case doc, and
`listSkillFiles` needs Tier 3 destructive setup). NO coverage gain.
Plan-doc updated. Followup-prompt rotated with the STOP flag (this
document).
The plan doc at
[`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
captures the tier classification and execution-time reclassifications.
Its "Status (post-execution)" section is the source of truth for
what's done and what's deferred — read **session 16** first, then
**session 15**, **session 14**, **session 13**, **session 12**,
**session 11**, **session 10**, **session 9**, **session 8**,
**session 7**, **session 6**, **session 5**, **session 4**, **session
3**, **session 2**, then **session 1** sub-sections.
This session is a continuation, not a restart. Start by reading the
plan doc's status sections AND verifying at least one of the
preconditions above holds. If none hold, STOP and report; don't try
to fan out.
### Session 16 final findings (key context for any session-17 attempt)
1. **T17's session-15 structural fix VERIFIED.** Bare 60s timeout is
gone. `seedFromHost` clones the host's signed-in config,
`waitForReady('userLoaded')` resolves to a post-login URL
(`https://claude.ai/epitaxy` on the dev box), the dialog mock
installs, and `CodeTab.activate({ timeout: 15_000 })` (session 14
migration) succeeds first try.
2. **T17's NEW failure mode is renderer-state-dependent, not AX.**
After `selectLocal()` clicks the Local menuitem, the Select-folder
pill never appears within 4s. The URL during the run was
`/epitaxy` — the user's workspace route. The folder-picker UI
may only render on `/new` (or a fresh project), not on a workspace
already containing files. To unblock: navigate to `/new`
post-userLoaded BEFORE `openFolderPicker()`. NOT shipped session
16 — needs a careful navigation primitive that doesn't break
existing seedFromHost specs.
3. **`openPill` / `clickMenuItem` migration STILL parked.** Session
16's T17 trace confirmed the env-pill open + Local click both
succeeded, ruling out the AX-polling-loop hypothesis once and for
all. Don't migrate those speculatively.
4. **Schema-rev resolved both deferred validators.**
`CustomPlugins.listRemotePluginsPage(limit: number, offset:
number)`. `LocalPlugins.listSkillFiles(pluginId: string,
skillName: string, pluginContext?: opaque)`. Neither shipped as a
Tier 2 invocation: `listRemotePluginsPage` is not anchored in any
case doc; `listSkillFiles` needs Tier 3 destructive setup.
5. **Coverage stalled at 74/76 (97%) for 4 consecutive sessions.**
Sessions 13-16 net deliverables: 1 primitive, 1 AX migration, 1
structural fix, 1 verification + 1 schema-rev investigation.
Without real-account fixtures, the harness's structural ceiling
is 74/76. The remaining 2 specs need real-account write-side
state.
### What a future session 17 might attempt (only if preconditions hold)
If precondition 1 (real signed-in debugger-attached Claude) holds:
- **Operon-mode probe** (Category A from sessions 13-16). Run
`eipc-registry-probe.ts` against the user's Claude with operon mode
toggled on/off, capture the diff in registered channels. May
surface a new case-doc-coverable handler.
- **Schema-rev smoke-test** for the session-16-resolved schemas
against the live debugger. `listRemotePluginsPage(limit: 10,
offset: 0)` should return an array shape; `listSkillFiles('some-
installed-plugin', 'some-skill')` would test the LocalPlugins
handler's auth path.
If precondition 2 (real-account write-side fixture) holds:
- **T11 runtime invocation.** With an installed plugin in
`~/.claude/plugins/`, the post-install state can be probed via
`listSkillFiles` and the slash-menu skills would assert the
case-doc claim "skills appear in the slash menu" (T11 step 3).
- **T17 navigation fix.** Add a `/new` navigation primitive to
`claudeai.ts`'s `CodeTab` so `openFolderPicker` works on a fresh
project route. Verify T17 reaches the dialog mock fired assertion.
If precondition 3 or 4 holds:
- **Defensive page-object refactor.** Re-snapshot the AX tree at the
Customize panel and Plugin browser modal, refresh case-doc
inventory anchors, migrate any decayed selectors.
### Termination signal interpretation
If session 17 is triggered without any precondition met, the right
move is the same as session 16's STOP recommendation: write a one-
paragraph "preconditions not met, no work shipped" plan-doc update
and terminate. Don't burn a session on documentation-only output.
### Constraints to respect (unchanged from sessions 1-16)
- Use `seedFromHost: true` for any auth-required spec — never
`CLAUDE_TEST_USE_HOST_CONFIG=1` / `isolation: null` (legacy shape
removed in session 15).
- eipc handlers register on `webContents.ipc._invokeHandlers`, NOT
global `ipcMain._invokeHandlers`. Use `lib/eipc.ts`.
- For arg validator schema-rev: smoke-test first, fall back to
bundle-grep on the rejection literal.
- For AX-tree consumers: use `lib/ax.ts` (`snapshotAx` /
`waitForAxNode` / `waitForAxNodes`).
- For call-site migrations to `waitForAxNode`: keep per-spec retry
budgets matching existing tuning.
- `lib/input.ts` is X11-only. `lib/input-niri.ts` is Niri-only. CDP
auth gate is alive (runtime SIGUSR1 attach, never Playwright
`_electron.launch()`). BrowserWindow Proxy gotcha — use
`webContents.getAllWebContents()`. `skipUnlessRow()` always first.
- No fixed sleeps. `retryUntil` from `lib/retry.ts`, Playwright
auto-wait, or `waitForAxNode` from `lib/ax.ts`.
- Diagnostics on every run via `testInfo.attach()`. Tag with
`severity:` and `surface:` annotations.
- Tabs in TS, ~80-char wrap.
- Don't break existing runners. H01-H05 are the canaries.
- `npm run typecheck` must stay clean.
- Don't run destructive Tier 3 write-side tests.
### Authoritative reference
Read these in order before fanning out:
- [`docs/testing/runner-implementation-plan.md`](runner-implementation-plan.md)
— tier classification + status sections.
- [`tools/test-harness/README.md`](../../tools/test-harness/README.md)
— runner conventions, the 74-spec inventory, primitives in
`lib/`, isolation defaults.
- [`docs/testing/cases/README.md`](cases/README.md) — case-doc
structure and the four anchor scopes.
- [`tools/test-harness/src/lib/`](../../tools/test-harness/src/lib/)
— the existing primitives.
- [`tools/test-harness/src/runners/`](../../tools/test-harness/src/runners/)
— every existing spec is a template.
### Phase 0 — calibration (mandatory before fanning out)
1. `cd tools/test-harness && npm run typecheck` — should pass.
2. Check debugger ATTACHMENT QUALITY (not just port). `ss -tln |
grep ':9229'`. If port open, probe webContents via `evalInMain`:
```ts
import { InspectorClient } from './src/lib/inspector.js';
const client = await InspectorClient.connect(9229);
const wcs = await client.evalInMain<unknown>(`
const { webContents } = process.mainModule.require('electron');
return webContents.getAllWebContents().map((w) => ({
id: w.id, url: w.getURL(), title: w.getTitle(),
}));
`);
console.log(wcs); client.close();
```
If every URL is `/login` / `find_in_page` / `main_window`, treat
as soft-blocked for auth-required investigations.
3. Disambiguate running Claude processes. `pgrep -af
"ozone-platform=x11.*app.asar"`; for each, inspect cmdline for
`user-data-dir`. Real Claude has
`~/.config/Claude` (or no user-data-dir flag); leaked test
isolations have `/tmp/claude-test-*`.
4. **Verify at least one precondition for resuming the orchestration
holds.** If none hold, write a "no preconditions met" plan-doc
update and STOP. Don't fan out.
### Operational notes
- For the bundle-grep schema-rev pattern (sessions 9, 11, 12, 16
precedents):
```bash
cd tools/test-harness && node -e "
const {extractFile} = require('@electron/asar');
const buf = extractFile(
'/usr/lib/claude-desktop/node_modules/electron/dist/resources/app.asar',
'.vite/build/index.js'
);
const s = buf.toString('utf8');
const idx = s.indexOf('<rejection-literal>');
console.log(s.slice(Math.max(0, idx - 1500), idx + 500));
"
```
- For seedFromHost specs: host MUST have a signed-in Claude.
`seedFromHost`'s host-claude-kill semantics will tear down any
running Claude process — flag clearly in the report before
invoking when the user's real Claude is running.
- For AX-tree polling: `lib/ax.ts`'s `waitForAxNode` /
`waitForAxNodes` for predicate-based polling.
- The eipc-registry probe (`tools/test-harness/eipc-registry-probe.ts`)
is the dedicated tool for inspecting per-wc IPC handler state.
Begin with Phase 0. Don't fan out until at least one of the
preconditions for resuming the orchestration is verified to hold.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,597 @@
# claude.ai UI Inventory Reconciliation
*Generated against [`ui-inventory.json`](./ui-inventory.json) v6 (captured 2026-05-03, app version 1.5354.0, 383 entries).*
*Reconciled 2026-05-02.*
This file diffs the human-written claims in [`ui/`](./ui/) against the
machine-captured ground-truth in [`ui-inventory.json`](./ui-inventory.json).
It is one-shot output meant to drive human cleanup of `ui/*.md` — re-run
the reconciliation script (TODO: not yet built) after major walker passes.
## Reading this document
Three categories of finding per surface:
- **In docs but not in renderer** — the doc names an element that has no
corresponding inventory entry. Possible causes (don't read this as "doc
is wrong"; the walker covers a subset of reality):
- **OS / window-manager element** — title bar, close/min/max buttons,
drop shadow, resize edges. These are drawn by the compositor, not by
claude.ai's renderer; the walker can't see them.
- **Out of renderer scope** — tray menu, libnotify notifications, IME
composition popups, Quick Entry popup window. These are main-process
or DE-level surfaces that don't exist in the claude.ai DOM.
- **Walker coverage gap** — Settings overlay, dialogs, deep Code-tab
panes (terminal, file pane, diff). The walker drilled some surfaces
but not others; absence here is "not yet observed" not "not present."
- **Account-state-dependent** — features that don't appear on this
user's plan (e.g. SSH connections panel, managed-settings rows,
specific Code-tab pane types).
- **Speculative** — doc was written from upstream behavior, not from a
Linux build. May not actually render.
- **In renderer but not in docs** — inventory captured an element that no
doc row mentions. Either the doc is incomplete for that surface, or the
element is tangential (search-results recency rows, instance-suffix
duplicates with `#2`/`+5` markers).
- **Fingerprint potentially drifted** — doc and inventory agree on the
element but the doc's selector hint disagrees with the inventory's
`fingerprint.selector`. Most `ui/*.md` rows use prose ("Top-left of
topbar") rather than CSS selectors, so this category is small.
Human triage is what closes any of these. Don't auto-edit `ui/*.md`.
## Summary
| Metric | Count |
|--------|-------|
| Inventory entries (total) | 383 |
| Inventory entries by kind | persistent 65 / structural 276 / menu 33 / instance 9 |
| Inventory entries marked `denylisted: true` | 9 (Send×4, Install×4, Remove×1) |
| `ui/*.md` files reconciled | 11 (10 surface files + README) |
| `ui/*.md` rows reconciled (rough — multi-element rows complicate the count) | ~210 element rows across all 10 surface files |
| Rows with confirmed inventory match | ~70 (~33%) |
| Rows flagged "in docs but not in renderer" | ~140 (~67%) — heavily skewed by OS-frame, tray, notifications, deep Code panes, Settings, Quick Entry being out-of-renderer or under-walked |
| Inventory entries with no `ui/*.md` mention | ~190 (~50%) — heavily skewed by per-conversation/per-skill/per-prompt-card structural rows that the docs treat as categories rather than enumerating |
| Doc rows with explicit selectors that drift from inventory | 0 verified — `ui/*.md` rows almost never carry CSS selectors |
Match counts are approximate. `ui/*.md` rows often describe categories
("Recent conversations," "Per-history-entry hover") that map to many
inventory entries; the inventory in turn enumerates structural elements
the docs intentionally don't list (every project skill button, every
search result option). The reconciliation is a triage signal, not a
metric.
## Per-surface breakdown
### `ui/window-chrome-and-tabs.md`
**Inventory surfaces likely covered:** none directly — OS window frame is
drawn by the compositor; the in-app topbar elements live under `root` as
`root.button.menu`, `root.button.collapse-sidebar`, `root.button.search`,
`root.button.back`, `root.button.forward`. The "tab strip" maps to
`root.button.chat`, `root.button.cowork`, `root.button.code`.
**Doc rows reconciled:** ~22
#### In docs but not in renderer
| Doc element | Reason class |
|-------------|--------------|
| Title bar | OS / window-manager |
| Close button (X) | OS / window-manager |
| Minimize button | OS / window-manager |
| Maximize / restore button | OS / window-manager |
| Resize edges | OS / window-manager |
| Window menu (right-click titlebar) | OS / window-manager |
| Cowork ghost icon | Walker captures `root.button.cowork` (the tab) but not the ghost-icon visual within the topbar shim |
| Drag region (gaps between buttons) | Renders as empty space — not an actionable element |
| Active tab indicator | Visual styling, not an actionable element |
| Tab badges (unread / Dispatch) | None observed; user state at capture had no badges |
| About dialog | Walker did not surface a dialog; About is reachable only from app/tray menu, both out of renderer scope |
| App menu (macOS-style) | Doc itself notes this is N/A on Linux |
| Update prompt | Conditional, not present at capture |
| Crash report dialog | Conditional, not present at capture |
#### In renderer but not in docs
| Inventory entry | Notes |
|-----------------|-------|
| `root.button.menu` ("Menu", `aria-label="Menu"`) | This is the doc's "Hamburger menu" — renamed |
| `root.button.collapse-sidebar` ("Collapse sidebar") | Doc has "Sidebar toggle"; arguably the same |
| `root.button.search` ("Search") | Doc's "Search icon"; same |
| `root.button.back` / `root.button.forward` | Doc's back/forward arrows; same |
| `root.a.skip-to-content` ("Skip to content") | A11y skip link; not in doc |
| `root.button.new-chat-n` ("New chat⌘N") | Topbar new-chat button; not in doc |
| `root.button.pinned`, `root.button.recents`, `root.button.projects`, `root.button.artifacts`, `root.button.customize` | Sidebar nav buttons; doc covers some of these in `sidebar.md` not here |
| `root.button.awaaddrick-max` ("AWAaddrick·Max") | User/plan badge in topbar; not in doc |
| `root.button.get-apps-and-extensions` | Topbar shortcut to apps page; not in doc |
| `root.tab.write` / `root.tab.learn` / `root.tab.code` / `root.tab.from-calendar` / `root.tab.from-gmail` | Quick-prompt-template tabs in the prompt area; doc covers Write/Learn/Code as Chat/Cowork/Code tabs but the inventory's `root.tab.code` is distinct from `root.button.code` |
#### Fingerprint potentially drifted
None — doc rows for this surface use Location prose only.
#### Notable cross-cut
The doc's "Chat / Cowork / Code" tab strip maps cleanly to
`root.button.chat`, `root.button.cowork`, `root.button.code`. But the
inventory also has `root.tab.code` (a `[role="tab"]`, not a button) which
is a separate element — the prompt-area template strip — that the doc
conflates with the main Chat/Cowork/Code switcher. Worth a human note.
---
### `ui/tray.md`
**Inventory surfaces covered:** none — the tray is a main-process Electron
`Tray` object on the system SNI bus, not part of claude.ai's DOM.
**Doc rows reconciled:** ~17
#### In docs but not in renderer
Every row, by design. Categories:
- Tray icon (light / dark theme) — main-process `Tray.setImage()`
- Right-click menu items (Show/Hide, Quick Entry, Open at Login,
Settings, About, Quit) — main-process `Menu.buildFromTemplate()`
- Left-click / double-click / middle-click behaviors — main-process
event handlers
- Tooltip on hover, position, icon resolution, theme switch — SNI
daemon and DE behavior
This entire file is correctly out of renderer scope; the walker is doing
the right thing by not capturing any of it.
#### In renderer but not in docs
N/A — surface mismatch.
---
### `ui/sidebar.md`
**Inventory surfaces likely covered:** `root` (sidebar lives in the root
chrome on claude.ai). Note: the doc opens "Code Tab Sidebar" but the
sidebar in the captured renderer is the global claude.ai sidebar, not a
Code-tab-specific one. The Code-tab-specific session list is captured
separately under `root.button.code.button.new-session-n` (60 entries).
**Doc rows reconciled:** ~18
#### In docs but not in renderer
| Doc element | Reason class |
|-------------|--------------|
| Filter: status / project / environment | Walker did not drill the filter dropdown |
| Group-by control | Same — within Code-tab session list |
| Session status indicator (idle/running/...) | Visual decoration on row, not an actionable element |
| Project / branch label | Same |
| Diff stats badge `+12 -1` | Conditional — no session at capture had pending diffs |
| Dispatch badge | Conditional — no Dispatch-spawned session at capture |
| Scheduled badge | Conditional — same |
| Hover archive icon | Hover-revealed; walker captures static state |
| Right-click context menu (Rename / Archive / etc.) | Walker does not synthesise right-clicks |
| Sidebar resize handle | Visual / draggable, not an aria-labeled element |
| Sidebar collapse toggle | Inventory has `root.button.collapse-sidebar` but doc treats it as a Code-tab element rather than chrome |
| Scrollbar | OS / theme-rendered |
| `Ctrl+Tab` / `Ctrl+Shift+Tab` cycling | Keyboard shortcut, not a UI element |
#### In renderer but not in docs
| Inventory entry | Notes |
|-----------------|-------|
| `root.button.fine-tuning-diffusion-models-with-reinforcement-learning` | A pinned recent conversation — sidebar content |
| `root.button.more-options-for-fine-tuning-diffusion-models-with-reinforce` | Per-row menu trigger — doc mentions "right-click context menu" but inventory shows it's a discoverable button |
| `root.button.how-to-use-claude` + `root.button.more-options-for-how-to-use-claude` | Same pattern |
| `root.button.code.button.routines` | "Routines" link in Code-tab nav — doc's "Routines link" is here |
| `root.button.code.button.more-navigation-items` | Likely the doc's "Customize / Routines" expander — not enumerated |
| `root.button.code.button.filter` | The doc's "Filter: status" probably maps here |
| `root.button.code.button.appearance` | Not in doc |
| `root.button.code.button.show-5-more` | Pagination; not in doc |
| `root.button.code.button.open-session-*` (5 entries) | Each is a single session row in the Code-tab list — the doc's "Per-session row" category |
#### Fingerprint potentially drifted
None — doc rows for this surface use Location prose only.
---
### `ui/prompt-area.md`
**Inventory surfaces likely covered:** `root` (top-level prompt area
buttons), `root.button.add-files-connectors-and-more` (the `+` menu),
`root.button.model-opus-4-7-adaptive` (model picker), and several deep
sub-surfaces.
**Doc rows reconciled:** ~28
#### In docs but not in renderer
| Doc element | Reason class |
|-------------|--------------|
| Input field | The contenteditable / textarea itself isn't captured (no aria-label) |
| Placeholder text | Not an interactive element |
| Cursor caret / multi-line autosize / word wrap | Behavior, not element |
| Paste plain text / paste image | Behavior |
| `Enter` to send / `Shift+Enter` / `Esc` | Keyboard behavior |
| IME composition | Not a renderer element |
| Attachment button (left of input) | Not surfaced — possibly bundled into `root.button.add-files-connectors-and-more` |
| File-attached chip | Conditional — no attachment at capture |
| Multiple attachments / image preview / PDF preview | Conditional |
| Drag-drop overlay | Conditional, only renders during drag |
| `@filename` autocomplete | Conditional, only renders when typing `@` |
| `+` button | Likely IS the `root.button.add-files-connectors-and-more` button — see below |
| Slash menu (all rows: Built-in / Project skills / User skills / Plugin skills / filter / selection / `Esc`) | Walker did not type `/` to trigger the slash menu; no inventory entries |
| Effort picker (`Cmd+Shift+E`) | Possibly inside `root.button.code.button.opus-4-7-1m-extra-high` — uncertain |
| Stop button (replaces Send while responding) | Conditional — no in-flight response at capture |
| Usage ring | Possibly `root.button.code.button.usage-plan-11` ("Usage: plan 11%") |
#### In renderer but not in docs
| Inventory entry | Notes |
|-----------------|-------|
| `root.button.press-and-hold-to-record` ("Press and hold to record") | Voice / dictation button in prompt area — doc has no voice input row |
| `root.button.code.button.dictation-settings` | Dictation settings button |
| `root.button.code.button.transcript-view-mode` | Transcript view toggle in prompt area |
| `root.button.code.button.scroll-to-bottom` | Scroll-to-bottom affordance |
| `root.button.code.button.accept-edits` | Permission-mode-related quick action |
| `root.button.code.button.add` ("Add") | Likely the doc's `+` button, with a different label |
| `root.button.code.button.usage-plan-11` ("Usage: plan 11%") | Probably the doc's "Usage ring" |
| `root.button.code.button.opus-4-7-1m-extra-high` ("Opus 4.7 1M· Extra high") | Probably the doc's "Effort picker" |
| All `root.button.add-files-connectors-and-more.menuitem.*` entries (Add files or photos / Add to project / Skills / Connectors / Plugins / Research / Web search / Use style) | The `+` menu contents — doc has Slash commands / Skills / Connectors / Plugins / Add plugin; inventory surfaces additional items the doc misses (Add files or photos, Add to project, Web search, Use style) |
| `root.button.add-files-connectors-and-more.menuitem.use-style.*` (8 entries: Normal / Learning / Concise / Explanatory / Formal / Create & edit styles / Research mode) | Style picker is a whole sub-surface the doc doesn't mention |
| `root.button.model-opus-4-7-adaptive.menuitemradio.*` (Opus / Sonnet / Haiku / Adaptive thinking / More models) | Doc says "Sonnet, Opus, Haiku" — inventory adds Adaptive thinking + More models |
#### Fingerprint potentially drifted
| Doc claim | Inventory says |
|-----------|----------------|
| `+` button → opens menu of "Slash commands / Skills / Connectors / Plugins / Add plugin" | The corresponding inventory button is labeled "Add files, connectors, and more" with `aria-label="Add files, connectors, and more"`. Menu contents don't include "Slash commands" or "Add plugin" sub-entry — doc menu structure is partly speculative |
---
### `ui/code-tab-panes.md`
**Inventory surfaces likely covered:** `root.button.code` (23 entries),
`root.button.code.button.new-session-n` (60 entries) — but no per-pane
sub-surfaces (no diff pane, no terminal pane, no preview pane, no file
pane).
**Doc rows reconciled:** ~50
#### In docs but not in renderer
Almost every Code-tab pane row is missing from the inventory. The walker
landed in the Code-tab "New session" shell but did not open or drill any
of the panes. Categories:
| Pane | Doc rows missing | Reason |
|------|------------------|--------|
| Pane chrome (header, drag/resize handles, close button, Views menu) | 5 rows | Walker coverage gap — no pane was open |
| Diff pane | 9 rows (file list, diff content, line click, Cmd+Enter, Accept/Reject, Review code) | Walker coverage gap |
| Preview pane | 11 rows | Walker coverage gap |
| Terminal pane | 7 rows | Walker coverage gap (also: only renders for Local sessions) |
| File pane | 7 rows | Walker coverage gap |
| Tasks / subagent pane | 5 rows | Walker coverage gap |
| Side chat overlay | 3 rows (trigger / content / close) | `root.button.code.button.close-side-chat` IS captured — the close button — but content isn't drilled |
| CI status bar | 5 rows | Conditional — no PR open at capture |
| View modes (Normal/Verbose/Summary) | 3 rows | Possibly behind `root.button.code.button.transcript-view-mode` — single inventory entry vs. 3 doc rows |
#### In renderer but not in docs
| Inventory entry | Notes |
|-----------------|-------|
| `root.button.code.button.local` ("Local") | Environment switcher chip — not in doc |
| `root.button.code.button.select-folder` ("Select folder…") | Folder-picker entry — doc references this only via T17 cross-reference |
| `root.button.code.button.send` (and `#2`, both denylisted) | Send button — doc has it under prompt-area, not panes |
| `root.button.code.button.transcript-view-mode` | The doc's "Transcript view dropdown" — single inventory entry |
| `root.button.code.button.opus-4-7-1m-extra-high` | Model selector inside Code-tab session shell |
| `root.button.code.button.usage-plan-11` | Usage ring inside Code-tab session shell |
| `root.button.code.button.accept-edits` ("Accept edits") | Permission-mode quick action — not in doc |
| All 60 `root.button.code.button.new-session-n.button.open-session-*` and per-session entries | Doc covers the session list in `sidebar.md`, not here, so this isn't really a gap for `code-tab-panes.md` |
#### Fingerprint potentially drifted
None — doc is prose-only.
---
### `ui/settings.md`
**Inventory surfaces likely covered:** `root.button.settings` (only 1
entry — "Settings" button itself), `root.button.awaaddrick-max.menuitem.settingsctrl`
(the menu-item route to Settings, label "SettingsCtrl,").
**Doc rows reconciled:** ~28
#### In docs but not in renderer
The Settings page itself is essentially un-walked. Settings opens as an
overlay/modal which the walker treated as a single button rather than
drilling into. Every row in the doc beyond "Settings window opens" lacks
a matching inventory entry:
| Doc section | Rows missing | Reason |
|-------------|--------------|--------|
| Settings root (close button, sidebar nav) | 3 rows | Walker coverage gap |
| Desktop app → General (Computer use, Keep computer awake, Denied apps, Unhide apps, Theme picker) | 5 rows | Walker coverage gap; some rows account-state-dependent |
| Desktop app → Account (name/email, plan badge, Sign out) | 3 rows | Walker coverage gap |
| Claude Code (Worktree location, Branch prefix, Auto-archive toggle, Persist preview, Preview toggle, Bypass-permissions toggle, Auto mode availability) | 7 rows | Walker coverage gap |
| Connectors page (list, per-connector entry, Manage, Disconnect, Add connector) | 5 rows | Walker coverage gap; partially covered by the in-session connectors menu |
| SSH connections (list, Add SSH connection button, per-connection entry) | 3 rows | Walker coverage gap; account-state-dependent |
| Keyboard shortcuts (list, value, Reset, Quick Entry shortcut) | 4 rows | Walker coverage gap |
| Local environment editor (open, Add variable, Remove variable, Apply to dev servers) | 4 rows | Walker coverage gap; account-state-dependent |
#### In renderer but not in docs
| Inventory entry | Notes |
|-----------------|-------|
| `root.button.settings` ("Settings", `aria-label="Settings"`) | The button that opens Settings — confirmed in chrome |
| `root.button.awaaddrick-max.menuitem.settingsctrl` ("SettingsCtrl,") | Settings menu item under the user/plan menu — alternate path |
#### Fingerprint potentially drifted
None.
#### Walker coverage note
Settings is a known walker coverage gap (see preamble). This doc is
substantively un-reconciled until a Settings drill pass lands.
---
### `ui/routines-page.md`
**Inventory surfaces likely covered:** none directly. Routines are
reachable via `root.button.code.button.routines`, but the page itself
isn't drilled.
**Doc rows reconciled:** ~26
#### In docs but not in renderer
Every doc row except the "Routines page link" itself is unmatched — the
walker captured the entry point but did not open the Routines page.
| Doc section | Rows missing | Reason |
|-------------|--------------|--------|
| Routines list (header, New routine button, list, per-routine row, Run-now icon, Pause/resume, click row) | 7 rows | Walker coverage gap |
| New routine form Local (Name, Description, Instructions, permission-mode picker, model picker, Working folder, Worktree toggle, Schedule preset, Time picker, Day picker, Save, Cancel, Folder-trust prompt) | 13 rows | Walker coverage gap |
| New routine form Remote (Trigger type, Connectors picker, Network access controls) | 3 rows | Walker coverage gap; doc itself is partly speculative ("Per upstream docs") |
| Routine detail (Run now, Active/Paused toggle, Edit, Delete, Review history, hover tooltip, Show more, Always allowed, Revoke approval) | 9 rows | Walker coverage gap |
#### In renderer but not in docs
| Inventory entry | Notes |
|-----------------|-------|
| `root.button.code.button.routines` ("Routines") | The entry-point link — doc's "Routines page link" |
#### Fingerprint potentially drifted
None.
---
### `ui/connectors-and-plugins.md`
**Inventory surfaces likely covered:** `root.button.add-files-connectors-and-more.menuitem.connectors`
(the in-session connector picker, 5 entries), plus the deeper per-connector
sub-surfaces under `.connectors.menuitemcheckbox.gmail.*` (15 entries).
Plugin browser surfaces (`root.button.back.*`) cover Skills, Connectors,
Add plugin, Typescript lsp, Php lsp, Playwright, Connectors, etc.
**Doc rows reconciled:** ~24
#### In docs but not in renderer
| Doc element | Reason class |
|-------------|--------------|
| Connectors menu — "Per-connector row" with status indicator | Inventory has Gmail and Google Calendar but not status decorations |
| Empty state | Conditional — user has connectors configured |
| Connector catalog (modal body, per-connector tile with logo/description) | Walker coverage gap — the Add-connector flow opens a modal that wasn't drilled |
| OAuth in-app overlay | Conditional, not present at capture |
| Permission consent screen | External (provider's UI) |
| Callback completion | Behavior, not an element |
| Custom connector entry point | Walker coverage gap |
| Plugin browser modal (browser modal, marketplace selector, per-plugin tile, scope selector, install progress, success state, error state) | Walker captured plugin surfaces under `root.button.back.*` (Add plugin, Typescript lsp, Php lsp, Playwright) but not the modal anatomy |
| Manage plugins (installed list, per-plugin row, Enable toggle, Plugin skills sub-list) | Walker coverage gap — no Manage-plugins surface drilled |
#### In renderer but not in docs
| Inventory entry | Notes |
|-----------------|-------|
| `root.button.add-files-connectors-and-more.menuitem.connectors` ("Connectors", in-session menu) | Doc covers this — the in-session Connectors menu |
| `root.button.add-files-connectors-and-more.menuitem.connectors.menuitemcheckbox.gmail` ("Gmail") | Per-connector row — doc "Per-connector row" category |
| `root.button.add-files-connectors-and-more.menuitem.connectors.menuitemcheckbox.google-calendar` ("Google Calendar") | Per-connector row — same |
| `root.button.add-files-connectors-and-more.menuitem.connectors.menuitem.manage-connectors` ("Manage connectors") | Doc's "Manage connectors entry" |
| `root.button.add-files-connectors-and-more.menuitem.connectors.menuitem.add-connector` ("Add connector") | Doc has "Add connector button" in Settings; inventory shows it also exists in the in-session menu |
| `root.button.add-files-connectors-and-more.menuitem.connectors.menuitem.tool-accessload-tools-when-needed` ("Tool accessLoad tools when needed") | Per-connector tool-access setting — not in doc |
| `root.button.back.a.skills` ("Skills") | Plugin browser — Skills tab |
| `root.button.back.a.connectors` / `root.button.back.a.connectors#2` (both "Connectors") | Plugin browser — Connectors tab (instance suffix `#2` indicates duplicate detection) |
| `root.button.back.button.add-plugin` ("Add plugin") | Plugin browser — Add plugin button |
| `root.button.back.a.typescript-lsp` / `root.button.back.a.php-lsp` / `root.button.back.a.playwright` | Installed plugins — doc treats this as "Manage plugins → Per-plugin row," walker captures the actual plugin names |
| `root.button.back.button.connect-your-appslet-claude-read-and-write-to-the-tools-you-` ("Connect your appsLet Claude read...") | Plugin browser landing pane CTA — not in doc |
| `root.button.back.a.create-new-skillsteach-claude-your-processes-team-norms-and-` ("Create new skillsTeach Claude your processes, team norms, and expertise.") | Skills-creation CTA — not in doc |
| `root.button.back.button.browse-pluginsadd-pre-built-knowledge-for-your-field` ("Browse pluginsAdd pre-built knowledge for your field.") | Browse-plugins CTA — not in doc |
| `root.button.add-files-connectors-and-more.menuitem.connectors.menuitemcheckbox.gmail.button.develop-storytelling-frameworks` and 9 similar `.option`/`.button` pairs | Connector-suggested prompt cards. Walker captured these as a side-effect of drilling Gmail — they aren't a doc-targeted UI element |
#### Fingerprint potentially drifted
| Doc claim | Inventory says |
|-----------|----------------|
| `+`**Connectors** opens "Connectors menu" | Inventory: button is "Add files, connectors, and more" not "+"; menu item is "Connectors". Functionally the same surface |
---
### `ui/quick-entry.md`
**Inventory surfaces covered:** none — Quick Entry is a separate
`BrowserWindow` constructed in the main process (`index.js:515375`), not
part of claude.ai's renderer. The walker started at `https://claude.ai/new`
which never reaches it.
**Doc rows reconciled:** ~17
#### In docs but not in renderer
Every row, by design. Categories:
- Window appearance (frame, background, rounded corners, drop shadow,
position, always-on-top, lifecycle, persistence after main destroy) —
main-process BrowserWindow construction
- Input area (text input, placeholder, multi-line, Enter/Shift+Enter,
Esc, click-outside, paste, IME) — popup renderer (separate from
claude.ai)
- Submit feedback (transition, loading, error) — popup renderer + IPC
bridge
This entire file is correctly out of renderer scope. Doc rows are
already heavily annotated with `index.js:515xxx` references to upstream
main-process source — that's the right substrate.
#### In renderer but not in docs
N/A — surface mismatch.
---
### `ui/notifications.md`
**Inventory surfaces covered:** none — notifications fire via libnotify
on the `org.freedesktop.Notifications` DBus path; they are not DOM
elements.
**Doc rows reconciled:** ~17
#### In docs but not in renderer
Every row, by design. Categories:
- Notification sources (Scheduled fires, Catch-up, CI status, PR merged,
Dispatch handoff, Permission prompt) — main-process emitters
- Per-notification anatomy (App identity, icon, title, body, actions,
click target) — DBus payload
- Per-DE rendering (KDE/GNOME/Mako/Dunst/swaync/Niri) — daemon behavior
- Notification persistence (history, DND) — daemon behavior
This entire file is correctly out of renderer scope.
#### In renderer but not in docs
N/A — surface mismatch.
---
## Top-level findings
### Coverage by source-of-truth axis
- **OS-level / window-manager elements** (window-chrome rows for
title bar, close/min/max, resize edges, drop shadow) — never going to
appear in the renderer inventory. ~10 doc rows.
- **Main-process Electron windows** (Quick Entry popup, About dialog,
crash dialog, file pickers) — never going to appear in the renderer
inventory. ~25 doc rows.
- **Tray menu** (Show/Hide, Quick Entry, Settings, About, Quit, Open
at Login) — main-process `Menu.buildFromTemplate()`. ~12 doc rows.
- **libnotify notifications** — DBus, not DOM. ~17 doc rows.
- **Walker coverage gaps** (Settings overlay, Routines page, plugin
browser modal, all Code-tab panes, dialogs, slash menu, drag-drop
overlay) — would appear if the walker drilled them. ~70 doc rows.
- **Account-state-dependent surfaces** (CI bar, Dispatch badges, file
attachments, SSH connections panel) — would appear in some sessions
but didn't at capture. ~15 doc rows.
- **Conditional / hover / behavior** (right-click context menus, hover
archive icons, drag-drop overlays, tooltips) — wouldn't appear in a
static walker pass even if the surface was visited. ~10 doc rows.
The combined explanation: roughly half of the "in docs but not in
renderer" mismatches are unfixable (different source of truth), and
roughly half are walker coverage gaps that future passes can close.
### Top 3 surfaces with the most "in docs but not in renderer" mismatches
These are likely candidates for speculative claims OR for un-walked
surfaces. Treat as triage queue:
1. **`ui/code-tab-panes.md`** — ~50 unmatched rows. Almost entirely
walker-coverage gap (the walker landed in the Code-tab shell but
opened no panes). Until the walker drills diff/preview/terminal/file/
tasks panes, this doc is un-reconcilable.
2. **`ui/settings.md`** — ~28 unmatched rows. Settings opens as an
overlay; walker captured only the Settings entry-point button. Needs
targeted drill.
3. **`ui/routines-page.md`** — ~26 unmatched rows. Same shape as
Settings — entry-point captured, page contents unwalked.
### Top 3 surfaces with the most "in renderer but not in docs" surplus
These docs are most-incomplete relative to ground truth:
1. **`ui/sidebar.md`** — Inventory has 60+ Code-tab session-list entries
under `root.button.code.button.new-session-n`. Doc treats sessions as
a single category row. This is intentional doc behavior, but it means
the doc doesn't help when reasoning about the actual structural
buttons (Filter, Appearance, Routines, More navigation items, Show 5
more, etc.) that the walker found.
2. **`ui/prompt-area.md`** — Inventory has the entire Use-style picker
sub-tree (Normal / Learning / Concise / Explanatory / Formal / Create
& edit styles + 5 preset cards), the Press-and-hold-to-record voice
button, dictation settings, transcript view mode, scroll-to-bottom,
and the model picker's "Adaptive thinking" / "More models" entries —
none of which the doc enumerates.
3. **`ui/connectors-and-plugins.md`** — Inventory has the entire plugin
browser sub-tree (`root.button.back.*` — 12 entries: Skills, Add
plugin, Typescript lsp, Php lsp, Playwright, Browse plugins, Create
new skills, Connect your apps, Connectors×2, Back to Claude, Select
a folder), and connector-suggested prompt cards (10 entries under
`.gmail.button.*`). Doc treats these surfaces at a higher level of
abstraction.
## Acknowledged gaps in inventory itself
Not all inventory absences are doc errors. Known walker gaps as of v6:
- **Settings page deep content** — only the entry-point button
(`root.button.settings`) and the menu shortcut
(`...menuitem.settingsctrl`) captured. Settings opens as an overlay
the walker did not drill.
- **Dialogs** — 0 captured. claude.ai may not use `[role=dialog]` for
most modals, or the walker's drill paths didn't reach them.
- **Code tab panes** — only the Code-tab session shell was drilled;
diff, preview, terminal, file, tasks, subagent, plan, side chat, CI
bar are uncaptured.
- **Routines page** — only the entry-point link was captured.
- **Plugin browser modal anatomy** — surrounding list captured, the
per-plugin install modal wasn't.
- **Slash menu** — walker did not type `/` to trigger.
- **Hover/right-click/drag-only affordances** — static walker; no
context menus or drag-drop overlays.
- **Quick Entry / Tray / Notifications** — out of renderer scope.
These are walker tickets, not bugs against the v6 capture.
## Triage suggestions for `ui/*.md` cleanup
Aimed at humans editing the docs. Ordered by impact:
1. **Mark out-of-renderer surfaces explicitly.** `ui/tray.md`,
`ui/quick-entry.md`, `ui/notifications.md`, and the OS-frame section
of `ui/window-chrome-and-tabs.md` already reference main-process
source and DE behavior — add a header note that this surface
intentionally doesn't appear in `ui-inventory.json`.
2. **Annotate walker-coverage-gap surfaces.** `ui/code-tab-panes.md`,
`ui/settings.md`, `ui/routines-page.md` — header note that the
inventory does not yet drill these surfaces; rows reflect upstream
behavior and are unverified in the renderer.
3. **Add missing topbar/prompt-area elements** to `ui/window-chrome-and-tabs.md`
and `ui/prompt-area.md` from the "In renderer but not in docs" lists.
4. **Decide the doc/inventory boundary for sidebar session lists.** Doc
treats sessions as a category; inventory enumerates each. Pick one
shape and document it.
5. **Flag speculative Linux-conditional rows**`ui/settings.md` SSH
connections, "Denied apps" / "Unhide apps when Claude finishes" for
Computer Use — mark as "may not render on Linux; verify before
assuming."

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,12 @@
{
"capturedAt": "2026-05-03T07:13:20.024Z",
"appVersion": "1.5354.0",
"walkerVersion": "7",
"startUrl": "https://claude.ai/epitaxy",
"totalElements": 90,
"deniedActions": 6,
"partial": false,
"isolation": "launchClaude (test-harness path)",
"seededFromHost": true,
"allowlistEntries": []
}

View File

View File

@@ -0,0 +1,76 @@
# UI snapshots
Captured renderer state for the `claude.ai` web view, taken via the
`explore` CLI in [`tools/test-harness/explore/`](../../../tools/test-harness/explore/).
Use these to detect upstream UI drift before it breaks the harness.
The snapshot JSON files themselves are gitignored
(`docs/testing/ui-snapshots/*.json`) — they're noisy diffs and
specific to the moment of capture. This directory is checked in so the
path exists; the README + `.gitkeep` are the only tracked files.
## Capture
Requires a running `claude-desktop` build with the main-process
debugger attached on port 9229 (Developer menu → Enable Main Process
Debugger). Then, from `tools/test-harness/`:
```sh
npx tsx explore/explore.ts snapshot baseline-code-tab
# → wrote /…/docs/testing/ui-snapshots/baseline-code-tab.json
```
Snapshot names are restricted to `[a-zA-Z0-9._-]`.
## Compare
```sh
npx tsx explore/explore.ts diff baseline-code-tab after-feature-x
```
Add `--json` for machine-readable output. Add `--exit-on-diff` to fail
the process (exit code 3) when there are any entries — useful inside a
CI guard.
`diff` arguments accept either a bare name (looked up in this dir,
`.json` appended) or an explicit path.
### What counts as a diff
| Kind | Meaning |
|-----------|---------------------------------------------------------|
| `removed` | Element keyed in A absent from B (drift signal). |
| `changed` | Same key, different visible text or structural detail. |
| `added` | New key in B (informational only — surface gained). |
## Snapshot shape
```jsonc
{
"capturedAt": "2026-05-02T17:30:00Z",
"claudeAiUrl": "https://claude.ai/…",
"appVersion": "1.1.7714", // from app.getVersion(), null on failure
"pageState": { "url", "title", "readyState" },
"dfPills": [ /* Chat / Cowork / Code top-level tabs */ ],
"compactPills": [ /* env pill, Select-folder pill, */ ],
"ariaLabeledButtons":[ /* every <button[aria-label]>, capped at 200 */ ],
"openMenu": { "ariaLabelledBy", "ariaLabel", "items": [...] },
"modals": [ /* role=dialog with heading + buttons */ ]
}
```
Discovery is by **structural shape**, never by minified Tailwind class
names. See the why-block at the top of
[`tools/test-harness/explore/snapshot.ts`](../../../tools/test-harness/explore/snapshot.ts)
for the rationale.
## Other subcommands
```sh
npx tsx explore/explore.ts # full snapshot to stdout
npx tsx explore/explore.ts pills # df-pills + compact-pills + state
npx tsx explore/explore.ts menu # currently-open menu (or null)
npx tsx explore/explore.ts find <re> # regex search over text + aria-label
```
`find` regex is case-insensitive by default.

View File

@@ -0,0 +1,360 @@
{
"derivedAt": "2026-05-03T02:51:23.409Z",
"sourceInventory": {
"capturedAt": "2026-05-03T00:21:38.299Z",
"appVersion": "1.5354.0",
"walkerVersion": "6",
"totalElements": 383
},
"stable": [
"Accept edits",
"Add",
"Add connector",
"Add files",
"Add files or photosCtrl+U",
"Add files, connectors, and more",
"Add from GitHub",
"Add to project",
"All projects",
"Appearance",
"Ask",
"Back",
"Back to Claude",
"Chat",
"Clear active",
"Close",
"Close side chat",
"Close suggestions",
"Code",
"Completed: See Claude workTry a quick task — Claude does it, you watch",
"ConcisePreset",
"Connectors",
"Conversation ID reference",
"Copy invite",
"Cowork",
"Create custom style",
"Create engaging headlines",
"Create presentation scripts",
"Develop content templates",
"Develop storytelling frameworks",
"Dictation settings",
"Dismiss checklist",
"Dismiss guest pass",
"Draft PR visibility on GitHub",
"ELKO HRN-33 and HRN-31 manuals",
"Edit Instructions",
"Electron apps Linux users desperately want but can't have\nDespite Electron's cross-platform promise, several high-profil",
"Expand sidebar",
"ExplanatoryPreset",
"Feedback submission",
"Filter",
"Fine-tuning diffusion models with reinforcement learning",
"FormalPreset",
"Forward",
"From Calendar",
"From Gmail",
"Get apps and extensions",
"Gmail",
"Google Calendar",
"How to use ClaudeAaddrick Williams",
"Install",
"Invalid session description",
"Lamination plate position offsetsAaddrick Williams",
"Learn",
"Learn about styles",
"Learn how to use Cowork safely",
"Learn more about styles",
"Learning",
"LearningPreset",
"Local",
"Manage connectors",
"Menu",
"Model: Legacy Model",
"Model: Opus 4.7 Adaptive",
"Model: Sonnet 4.6 Adaptive",
"More navigation items",
"More options",
"More options for Fine-tuning diffusion models with reinforcement learning",
"More options for How to use Claude",
"New artifact",
"New project",
"Open session Audit for elementary-data supply chain vulnerability",
"Open session Find contact method for Claude Desktop issue",
"Open session Plan automated testing strategy for desktop app",
"Open session Test DNS query for Claude desktop package",
"Open session for PR #552",
"Pair your phoneSend tasks from your phone for Claude to run here",
"Pin project",
"Pinned",
"Plugins",
"Press and hold to record",
"Recents",
"Research",
"Research mode",
"Schedule a recurring taskGreat for reminders, reports, or regular check-ins",
"Scroll to bottom",
"Search",
"Search projects",
"Select folder…",
"Send",
"Settings",
"Show 5 more",
"Show more",
"Skills",
"Skip to content",
"Sort by",
"Start a task in Cowork",
"Style: Formal",
"Terms apply",
"Test",
"Testing and Quality Assurance",
"Tool accessLoad tools when needed",
"Transcript view mode",
"Untitled",
"Use style",
"View all",
"Web search",
"West Central Schools provincial takeover investigation",
"Work in a project",
"Write",
"Write something in the voice of my favorite historical figure",
"Your artifactsYour artifacts",
"about_tab.py, py, 60 lines",
"New chat⌘N",
"New session⌘N",
"New task⌘N",
"Artifacts",
"Live artifacts",
"Scheduled",
"DispatchBeta",
"Routines",
"How to use Claude",
"Projects",
"Customize"
],
"instanceShapes": [
{
"id": "plan-badge",
"regex": "^.+·(Free|Pro|Max|Team|Enterprise)[-\\s]*$",
"flags": "u",
"pattern": "\\w+·(Free|Pro|Max|Team|Enterprise)",
"matchedNames": [
"AWAaddrick·Max"
]
},
{
"id": "opus-version",
"regex": "^Opus \\d",
"flags": "",
"pattern": "^Opus \\d",
"matchedNames": [
"Opus 4.7 1M· Extra high",
"Opus 4.7Most capable for ambitious work"
]
},
{
"id": "sonnet-version",
"regex": "^Sonnet \\d",
"flags": "",
"pattern": "^Sonnet \\d",
"matchedNames": [
"Sonnet 4.6Most efficient for everyday tasks"
]
},
{
"id": "haiku-version",
"regex": "^Haiku \\d",
"flags": "",
"pattern": "^Haiku \\d",
"matchedNames": [
"Haiku 4.5Fastest for quick answers"
]
},
{
"id": "percentage",
"regex": "\\d{1,3}%$",
"flags": "",
"pattern": "\\d{1,3}%",
"matchedNames": [
"Usage: plan 11%"
]
},
{
"id": "relative-date",
"regex": "(Today|Yesterday|\\d+\\s(day|hour|minute|second|week|month|year)s?\\sago)",
"flags": "",
"pattern": "(Today|Yesterday|\\d+\\s(day|hour|minute|second|week|month|year)s?\\sago)(\\+\\d+)?",
"matchedNames": [
"Claude Desktop Debian1 year ago",
"Draft PR visibility on GitHubYesterday",
"ELKO HRN-33 and HRN-31 manualsYesterday",
"Feedback submissionYesterday",
"Find contact method for Claude Desktop issuePR #552 · Yesterday",
"Review PR 555 for issue 558 fixToday",
"Review and analyze issue 545Yesterday"
]
},
{
"id": "size-with-unit",
"regex": "^\\d+\\.\\d+\\s\\w+",
"flags": "",
"pattern": "^\\d+\\.\\d+\\s\\w+",
"matchedNames": []
},
{
"id": "user-handle",
"regex": "@\\w+",
"flags": "",
"pattern": "@\\w+",
"matchedNames": []
},
{
"id": "long-title",
"regex": "^[A-Z][a-z]+ [A-Z][a-z]+ [a-z]",
"flags": "",
"pattern": null,
"matchedNames": [
"Evaluate Terraform for infrastructure setup",
"Host Obsidian library in second database"
]
}
],
"suspect": [
"Adaptive thinkingThinks for more complex tasks",
"Add build instructions and patch toggle option",
"Add build instructions and quick menu patch toggle",
"Add plugin",
"Audit for elementary-data supply chain vulnerability",
"Automate",
"Browse pluginsAdd pre-built knowledge for your field.",
"Build adversarial resume review platform MVP",
"Change fonts to Lexend",
"Check Quad9 DNS resolution for package domain",
"Check flight map tile caching history",
"Check for Trivy supply chain vulnerability",
"Claude Desktop DebianAaddrick Williams",
"Claude Desktop DebianEnter",
"Claude is AI and can make mistakes. Please double-check responses.",
"Claude prompting guide.md, md, 413 lines",
"Clawdmartclawdmart.comClaudeCreate a shopping list, go on Chrome, and make an order",
"Collapse sidebar",
"Compare GPU options for gaming performance",
"Concise",
"Connect your appsLet Claude read and write to the tools you already use.",
"Copy",
"Create & edit styles",
"Create new skillsTeach Claude your processes, team norms, and expertise.",
"Create user documentation",
"Customer Email",
"Data",
"Develop editorial guidelines",
"Dispatch background conversation",
"Download",
"Draw",
"Edit",
"Educational Content",
"Evaluate productization viability of methodology",
"Explanatory",
"Find contact method for Claude Desktop issue",
"Fix Claude Desktop installation on Debian",
"Formal",
"Formulas",
"Give negative feedback",
"Give positive feedback",
"Help me develop a unique voice for an audience",
"Home",
"How to use ClaudeAn example project that also doubles as a how-to guide for using Claude. Chat with it to learn more abo",
"Identify tools for session start hook",
"Insert",
"Investigate GitHub Actions workflow failure",
"Investigate GitHub issue 394 comment",
"Investigate leaked crates.io API key",
"Investigate leaked crates.io token in repository",
"Lamination plate position offsetsAdjust existing code to just populate a table with original positions, new positions, a",
"Marketing Blog Post",
"More models",
"More options for Claude Desktop Debian",
"More options for Lamination plate position offsets",
"My downloads folder is a mess! Can you clean it up?",
"Normal",
"Open",
"Options",
"Page Layout",
"Php lsp",
"Plan automated testing strategy for desktop app",
"Playwright",
"Product Review",
"Read health data",
"Retry",
"Review",
"Review PR 555 for issue 558 fix",
"Review and address issue 88",
"Review and analyze issue 545",
"Review and close stale issues",
"Review and investigate GitHub issue 445",
"Review issue 156",
"Review issue 172 and document related history",
"Review issue 373",
"Review last three repository commits",
"Review path resolution issues and pull requests",
"Review project issues and pull requests",
"Review recent comments, issues, and pull requests",
"Select a folder",
"Share chat",
"Short Story",
"Start a new project",
"Start return",
"Style: Concise",
"Style: Explanatory",
"Style: Learning",
"Test DNS lookup with Quad9 resolver",
"Test DNS query for Claude desktop package",
"Test path resolution",
"Test startsession hook functionality",
"Troubleshoot modem downstream connection issue",
"Turn these receipts into an expense report",
"Typescript lsp",
"Unpin project",
"Untitled, rename chat",
"View",
"Write case studies",
"Write speech drafts",
"analyze_project.py, py, 220 lines",
"base_half_sheet.py, py, 32 lines",
"changelog_viewer_component.py, py, 113 lines",
"colors.py, py, 103 lines",
"compensation.py, py, 50 lines",
"components.py, py, 118 lines",
"components.py, py, 119 lines",
"config_reader.py, py, 120 lines",
"contraction_tab.py, py, 105 lines",
"contraction_tab.py, py, 82 lines",
"conversions.py, py, 28 lines",
"data_parser.py, py, 87 lines",
"dialogs.py, py, 34 lines",
"file_operations.py, py, 43 lines",
"log.py, py, 140 lines",
"log.py, py, 236 lines",
"machines.ini, ini, 2 lines",
"main.py, py, 203 lines",
"main.py, py, 264 lines",
"output_tab.py, py, 191 lines",
"output_tab.py, py, 246 lines",
"process_request.py, py, 632 lines",
"processing_format.ini, ini, 2 lines",
"setup_tab.py, py, 120 lines",
"setup_tab.py, py, 177 lines",
"sheet_dimensions.ini, ini, 3 lines",
"version 0.1.0.md, md, 42 lines",
"version 0.1.1.md, md, 31 lines",
"version 0.1.2.md, md, 18 lines",
"View all plans",
"Get apps and extensions",
"Gift Claude",
"Language",
"Get help",
"Learn more",
"Log out",
"SettingsCtrl,"
]
}

78
docs/testing/ui/README.md Normal file
View File

@@ -0,0 +1,78 @@
# UI Element Inventory
This directory holds per-surface UI checklists. Where [`../cases/`](../cases/) tests verify *behavior end-to-end*, files here verify *every UI element renders and responds* on Linux.
## Why a separate directory
A functional test like [T17 — Folder picker opens](../cases/code-tab-foundations.md#t17--folder-picker-opens) verifies the folder picker works. A UI checklist asks the smaller, more granular questions:
- Is the **Select folder** button visually present?
- Does its hover state render?
- Is the icon next to it the correct shape on a HiDPI screen?
- Does it tab-focus correctly?
- Does it have an accessible name (a11y)?
Functional tests catch "the feature broke." UI checklists catch "the feature works but looks wrong." Both matter on Linux because Electron under different DEs / display servers / GTK theme combinations produces visual artifacts that aren't behavioral failures.
## Layout
| File | Surface | Notes |
|------|---------|-------|
| [`window-chrome-and-tabs.md`](./window-chrome-and-tabs.md) | OS window frame + hybrid in-app topbar + Chat/Cowork/Code tabs | Crosses with [T04](../cases/tray-and-window-chrome.md#t04--window-decorations-draw), [T07](../cases/tray-and-window-chrome.md#t07--in-app-topbar-renders--clickable) |
| [`tray.md`](./tray.md) | System tray icon + menu + theme variants | Crosses with [T03](../cases/tray-and-window-chrome.md#t03--tray-icon-present), [S08](../cases/tray-and-window-chrome.md#s08--tray-icon-doesnt-duplicate-after-nativetheme-update) |
| [`sidebar.md`](./sidebar.md) | Session sidebar in Code tab | Crosses with [T29](../cases/code-tab-workflow.md#t29--worktree-isolation), [T30](../cases/code-tab-workflow.md#t30--auto-archive-on-pr-merge), [S24](../cases/platform-integration.md#s24--dispatch-spawned-code-session-appears-with-badge-and-notification) |
| [`prompt-area.md`](./prompt-area.md) | Code-tab prompt input area | Crosses with [T18](../cases/code-tab-foundations.md#t18--drag-and-drop-files-into-prompt), [T32](../cases/code-tab-workflow.md#t32--slash-command-menu) |
| [`code-tab-panes.md`](./code-tab-panes.md) | Diff, preview, terminal, file, tasks, subagent, plan, side-chat | Crosses with [T19](../cases/code-tab-foundations.md#t19--integrated-terminal), [T20](../cases/code-tab-foundations.md#t20--file-pane-opens-and-saves), [T21](../cases/code-tab-workflow.md#t21--dev-server-preview-pane), [T22](../cases/code-tab-workflow.md#t22--pr-monitoring-via-gh), [T31](../cases/code-tab-workflow.md#t31--side-chat-opens) |
| [`settings.md`](./settings.md) | All Settings pages | Crosses with [S20](../cases/routines.md#s20--keep-computer-awake-inhibits-idle-suspend), [S22](../cases/platform-integration.md#s22--computer-use-toggle-is-absent-or-visibly-disabled-on-linux), [T30](../cases/code-tab-workflow.md#t30--auto-archive-on-pr-merge) |
| [`routines-page.md`](./routines-page.md) | Routines list + new-routine form + detail page | Crosses with [T26](../cases/routines.md#t26--routines-page-renders), [T27](../cases/routines.md#t27--scheduled-task-fires-and-notifies) |
| [`connectors-and-plugins.md`](./connectors-and-plugins.md) | Connector picker, connector list, plugin browser, plugin manager | Crosses with [T11](../cases/extensibility.md#t11--plugin-install-anthropic--partners), [T33](../cases/extensibility.md#t33--plugin-browser), [T34](../cases/code-tab-handoff.md#t34--connector-oauth-round-trip) |
| [`quick-entry.md`](./quick-entry.md) | Quick Entry popup window | Crosses with [T06](../cases/shortcuts-and-input.md#t06--quick-entry-global-shortcut-unfocused), [S10](../cases/shortcuts-and-input.md#s10--quick-entry-popup-is-transparent-no-opaque-square-frame) |
| [`notifications.md`](./notifications.md) | libnotify rendering for all notification sources | Crosses with [T23](../cases/code-tab-handoff.md#t23--desktop-notifications-fire), [T27](../cases/routines.md#t27--scheduled-task-fires-and-notifies), [S24](../cases/platform-integration.md#s24--dispatch-spawned-code-session-appears-with-badge-and-notification) |
## Standard checklist row
Each UI file uses tables of the form:
| Element | Selector / location | Expected | Notes |
|---------|---------------------|----------|-------|
| Close button | Top-right of titlebar | Renders, hover state visible, click hides to tray (see T08) | KDE-W: ✓ |
Columns:
- **Element** — human-readable name.
- **Selector / location** — DOM selector if known, otherwise plain-language pointer ("right-click menu, second item from top"). The selector column is what becomes a Playwright/CDP assertion when automation lands.
- **Expected** — what the user should see / what should happen on click. Concise.
- **Notes** — known issues, environment caveats, screenshot links.
## Sweep workflow
A UI sweep on a row:
1. Take a baseline screenshot of each surface (`scrot`, `gnome-screenshot`, `grim`, `flameshot`).
2. Walk each table top-to-bottom. For each row, look at the element, click/hover/tab to it, compare against Expected.
3. Mark anomalies in the **Notes** column or file an issue if the deviation is environment-specific.
4. Save screenshots of any failure to a dated folder; reference them inline.
UI rows don't have stable IDs (`T##` / `S##`) — they're append-only checkpoints. When something becomes a regression candidate worth tracking long-term, promote it to a functional test in [`../cases/`](../cases/).
## Automation roadmap
Each UI checklist row is a candidate Playwright (via [Electron driver](https://playwright.dev/docs/api/class-electron)) or `xdotool` assertion:
```typescript
// Playwright shape
await page.locator('[data-testid="close-button"]').click()
await expect(window).toBeHidden()
```
Or for pure visual diffing:
```bash
# scrot + perceptualdiff
scrot -u baseline.png
# ... interaction ...
scrot -u current.png
perceptualdiff baseline.png current.png
```
The structure here is intentionally diff-friendly: rows are stable, tables are append-only, selectors live in their own column.

View File

@@ -0,0 +1,114 @@
# UI — Code Tab Panes
Drag-and-drop panes inside a Code-tab session: diff, preview, terminal, file editor, tasks, subagent, plan, side chat. Related functional tests: [T19](../cases/code-tab-foundations.md#t19--integrated-terminal), [T20](../cases/code-tab-foundations.md#t20--file-pane-opens-and-saves), [T21](../cases/code-tab-workflow.md#t21--dev-server-preview-pane), [T22](../cases/code-tab-workflow.md#t22--pr-monitoring-via-gh), [T31](../cases/code-tab-workflow.md#t31--side-chat-opens).
## Pane chrome (common)
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Pane header | Top of pane | Shows pane title, drag handle, close button | — |
| Drag handle | Pane header | Drag repositions the pane in the layout | — |
| Resize handle | Edge between panes | Drag resizes; double-click resets | — |
| Close pane button | Pane header right | `Cmd+\` or Ctrl+\\ shortcut equivalent | — |
| Views menu | Session toolbar | Lists all openable panes; click to add | — |
## Diff pane
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Diff stats indicator | Chat / sidebar (entry point) | Shows `+12 -1` style. Click opens diff pane | — |
| File list | Left side of pane | Lists changed files, click to navigate | — |
| Diff content | Right side | Side-by-side or unified diff renders cleanly | Theme-aware (dark/light) |
| Line click → comment box | Click any line | Opens inline comment input | — |
| Comment submit (`Cmd+Enter` / `Ctrl+Enter`) | Press the shortcut after writing | Submits all comments at once | — |
| Accept button | Per-file or per-hunk | Applies the change to disk | — |
| Reject button | Per-file or per-hunk | Discards the change | — |
| **Review code** button | Top-right of pane | Triggers Claude self-review of diff | — |
## Preview pane
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Preview dropdown | Session toolbar | Lists configured servers from `.claude/launch.json` | — |
| **Start** action | Per-server entry | Launches the dev server | — |
| **Stop** action | Per-server entry | Stops the dev server | — |
| **Stop all servers** | Dropdown bottom | Stops every running server | — |
| **Edit configuration** | Dropdown bottom | Opens `.claude/launch.json` in the file pane | — |
| **Persist sessions** toggle | Dropdown | Persists cookies / localStorage across server restarts | — |
| Embedded browser frame | Pane content | Renders the running app | Uses Electron `<webview>` or `BrowserView` |
| URL bar / address | Top of pane | Shows current URL; editable | — |
| Reload button | Top of pane | Reloads the embedded URL | — |
| DevTools toggle | Top of pane (right) | Opens Electron DevTools for the embedded view | — |
| Auto-verify screenshots | When Claude verifies a change | Brief overlay shows screenshot being captured | — |
## Terminal pane
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Terminal pane | Opened via `Ctrl+`` or Views menu | Bash/zsh/fish session in the working directory ([T19](../cases/code-tab-foundations.md#t19--integrated-terminal)) | Local sessions only |
| Cursor | Inside terminal | Blinks; cursor shape per shell | — |
| Resize | Drag pane edges | Terminal cols/rows update; `tput cols` reflects new width | SIGWINCH should fire |
| Scrollback | Type many lines | Scrollable history; mouse scroll wheel works | — |
| Color rendering | Run `ls --color=auto`, `tput colors` | 256-color or truecolor support; theme-aware | — |
| Copy / paste | Select + `Ctrl+Shift+C` / `Ctrl+Shift+V` | Standard terminal-emulator shortcuts | — |
| Working directory inheritance | Open pane in a session | Opens at the session's project folder | Confirm with `pwd` |
## File pane
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| File pane | Opened by clicking a file path | Shows file content, syntax-highlighted | — |
| Save button | Pane toolbar | Writes current content to disk | — |
| Path label | Pane header | Click copies absolute path | — |
| On-disk-changed warning | If file changed externally after open | Banner with Override / Discard options ([T20](../cases/code-tab-foundations.md#t20--file-pane-opens-and-saves)) | — |
| Discard button | When edits unsaved | Reverts to disk content | — |
| Cursor / selection | Inside content | Renders correctly; multi-cursor not supported | — |
| Find / replace | `Ctrl+F` | Opens find-in-file overlay | Verify scoped to current pane only |
## Tasks pane / subagent pane
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Tasks pane | Opened via Views menu | Lists subagents, background shell commands, workflows | — |
| Task entry click | Click any task | Opens the subagent pane with output | — |
| Stop task button | Per-task | Sends interrupt signal | — |
| Task status indicator | Per-task | Running / Completed / Failed | — |
| Output stream | Inside subagent pane | Live-updating stdout/stderr | — |
## Side chat overlay
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Side chat trigger | `Ctrl+;` or `/btw` in main prompt | Opens overlay attached to current session ([T31](../cases/code-tab-workflow.md#t31--side-chat-opens)) | — |
| Side chat content | Overlay body | Reads main thread context; replies stay in side chat | — |
| Close button | Overlay top-right | Closes side chat, returns focus to main session | — |
## CI status bar
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| CI status row | Below prompt area when PR open | Shows current check states | Crosses with [T22](../cases/code-tab-workflow.md#t22--pr-monitoring-via-gh) |
| **Auto-fix** toggle | Top of CI bar | Toggles automatic check-failure fixes | — |
| **Auto-merge** toggle | Top of CI bar | Toggles auto-merge on green | Requires GitHub repo setting |
| Per-check entries | Each CI check | Shows pass / fail / pending state | Click to see logs |
| CI completion notification | When all checks resolve | Desktop notification posted ([T23](../cases/code-tab-handoff.md#t23--desktop-notifications-fire)) | — |
## View modes
| Mode | Trigger | Expected | Notes |
|------|---------|----------|-------|
| Normal | Default; cycle via `Ctrl+O` | Tool calls collapsed into summaries, full text responses | — |
| Verbose | Cycle via `Ctrl+O` | Every tool call, file read, intermediate step | Use for debugging |
| Summary | Cycle via `Ctrl+O` | Only Claude's final responses + changes | Use when scanning many sessions |
| Transcript view dropdown | Next to send button | Same as `Ctrl+O` | — |
## Failure modes to watch for
| Symptom | Likely cause | Notes |
|---------|--------------|-------|
| Pane drag doesn't snap to layout zones | Layout engine state corruption; restart session | — |
| Terminal cursor doesn't blink | `xterm-256color` not propagated; `TERM` env wrong | `echo $TERM` inside the pane |
| File pane "Save" silently no-ops | Read-only filesystem ([S28](../cases/extensibility.md#s28--worktree-creation-surfaces-clear-error-on-read-only-mounts)); permissions wrong | `stat <file>` for ownership |
| Preview pane embedded browser blank | Dev server didn't bind expected port; `autoPort` config | Check launcher log; `lsof -i :<port>` |
| Auto-verify screenshots fail | Headless screenshot in embedded view broken on Wayland | Test on X11 row; report to upstream |
| CI bar shows stale state | `gh` polling interval; rate-limited | `gh api rate_limit`; manual `gh pr checks <num>` |

View File

@@ -0,0 +1,70 @@
# UI — Connectors & Plugins
Connector picker, connectors list, plugin browser, plugin manager. Related functional tests: [T11](../cases/extensibility.md#t11--plugin-install-anthropic--partners), [T33](../cases/extensibility.md#t33--plugin-browser), [T34](../cases/code-tab-handoff.md#t34--connector-oauth-round-trip), [S27](../cases/extensibility.md#s27--plugins-install-per-user-not-into-system-paths).
## Connector picker (in-session)
Triggered by `+`**Connectors** in the prompt area.
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Connectors menu | Opened from `+` button | Lists configured connectors + "Manage connectors" entry | — |
| Per-connector row | Menu item | Name, status indicator (connected / not configured), action button | — |
| **Manage connectors** entry | Bottom of menu | Opens Settings → Connectors | Crosses with [`settings.md`](./settings.md#connectors) |
| Empty state | When no connectors configured | Helpful prompt with "Add connector" call to action | — |
## Connectors list (Settings → Connectors)
See [`settings.md`](./settings.md#connectors) for the surface.
## Add-connector flow
Triggered from the connector picker or Settings.
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Connector catalog | Modal body | Searchable list (Slack, GitHub, Linear, Notion, Google Calendar, etc.) | — |
| Per-connector tile | Catalog entry | Logo, name, short description | — |
| **Connect** button | Per tile | Initiates OAuth flow ([T34](../cases/code-tab-handoff.md#t34--connector-oauth-round-trip)) | Click → `xdg-open` to provider |
| OAuth in-app overlay (if used) | Replaces system browser handoff in some flows | Embedded login pane | — |
| Permission consent screen | OAuth provider side | Provider's UI; not under our control | — |
| Callback completion | After OAuth completes | Returns to Claude Desktop, connector now in list | If the URL scheme handler is broken, user is stranded in browser |
| Custom connector entry point | Catalog bottom | "Add custom connector via remote MCP" link | — |
## Plugin browser
Triggered by `+`**Plugins****Add plugin**, or from sidebar **Customize****Plugins**.
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Plugin browser modal | Opened from menu | Searchable marketplace catalog | — |
| Marketplace selector | Top of modal | Default: Anthropic official; user-configured marketplaces also visible | — |
| Per-plugin tile | Catalog body | Name, author, description, install count | — |
| **Install** button | Per tile | Click installs to `~/.claude/plugins/` ([T11](../cases/extensibility.md#t11--plugin-install-anthropic--partners), [S27](../cases/extensibility.md#s27--plugins-install-per-user-not-into-system-paths)) | — |
| Plugin scope selector | Per install | User / Project / Local-only | — |
| Install progress indicator | During install | Spinner + "Installing X..." text | — |
| Install success state | After install | Confirmation; plugin now in **Manage plugins** | — |
| Install error state | On failure | Error message identifying the cause (network, signature, conflict) | — |
## Manage plugins
Triggered by `+`**Plugins****Manage plugins**.
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Installed plugins list | Modal body | One row per installed plugin | — |
| Per-plugin row | List item | Name, version, scope (User / Project / Local), enable toggle, uninstall button | — |
| Enable toggle | Per row | Toggles plugin on/off without uninstall | — |
| **Uninstall** button | Per row | Removes plugin files from `~/.claude/plugins/` | Confirmation expected |
| Plugin skills sub-list | Expand row | Lists skills, agents, hooks, MCP servers, LSP configs the plugin contributes | — |
## Failure modes to watch for
| Symptom | Likely cause | Notes |
|---------|--------------|-------|
| Connect OAuth doesn't return to app | Custom URI scheme not registered ([T34](../cases/code-tab-handoff.md#t34--connector-oauth-round-trip)) | `xdg-mime query default x-scheme-handler/claude` |
| Plugin browser empty | Marketplace fetch failed; offline | DevTools network panel |
| Install progress stalls | Network / signature verification | Launcher log; check `~/.claude/plugins/.partial/` for incomplete downloads |
| Plugin installed but skills don't appear | Slash menu cache stale; restart session | — |
| Uninstall leaves files | Filesystem permissions; some plugin files owned by root | `find ~/.claude/plugins/ -not -user $USER` |
| Connector "Connected" but tools fail | Token expired; backend refuses; needs reconnect | Disconnect → reconnect |

View File

@@ -0,0 +1,59 @@
# UI — Desktop Notifications
Notification rendering across DEs. The app dispatches notifications via `org.freedesktop.Notifications` (libnotify spec); each DE renders them differently. Related functional tests: [T23](../cases/code-tab-handoff.md#t23--desktop-notifications-fire), [T27](../cases/routines.md#t27--scheduled-task-fires-and-notifies), [S24](../cases/platform-integration.md#s24--dispatch-spawned-code-session-appears-with-badge-and-notification).
## Notification sources
The app posts notifications for the following events. Each should fire reliably on every supported DE.
| Source | Trigger | Expected text | Click action | Notes |
|--------|---------|---------------|--------------|-------|
| Scheduled task fires | When a routine starts a run | "Scheduled task `<name>` started" or similar | Focus the new session in sidebar | Crosses with [T27](../cases/routines.md#t27--scheduled-task-fires-and-notifies) |
| Catch-up run | When a missed run starts after wake | "Catching up on `<name>`" + missed-time hint | Focus the catch-up session | Crosses with [T28](../cases/routines.md#t28--scheduled-task-catch-up-after-suspend) |
| CI status change | When PR's CI state resolves | "CI passed for `<branch>`" or "CI failed: `<check>`" | Focus the session with CI bar | Crosses with [T22](../cases/code-tab-workflow.md#t22--pr-monitoring-via-gh) |
| PR merged (auto-archive trigger) | When watched PR merges | "PR `<title>` merged. Session archived" | — | Crosses with [T30](../cases/code-tab-workflow.md#t30--auto-archive-on-pr-merge) |
| Dispatch handoff | When a Dispatch task creates a Code session | "Dispatch session ready: `<task>`" | Focus the new Dispatch-badged session | Crosses with [S24](../cases/platform-integration.md#s24--dispatch-spawned-code-session-appears-with-badge-and-notification) |
| Permission prompt awaiting approval | When a session in Ask mode needs user approval | "Claude needs your approval" | Focus the awaiting session | Sessions in Ask mode stall until answered |
## Per-notification anatomy
Each notification should include:
| Element | Expected | Notes |
|---------|----------|-------|
| App identity | "Claude" or "Claude Desktop" as the source | DE-specific (Plasma shows the app name and icon prominently) |
| Notification icon | App icon (theme-aware) | Should match the same icon set as the tray |
| Title | Short event headline | One line, no truncation issues for typical lengths |
| Body | One or two short lines of context | Wrap correctly for the DE's notification width |
| Actions (if any) | Inline buttons (e.g. "Open", "Dismiss") | Some DEs show actions, some require expand |
| Click target | Activates the relevant session/window | — |
## Per-DE rendering
| DE / daemon | Expected render | Caveats |
|-------------|-----------------|---------|
| KDE Plasma | KDE notification daemon (KNotifications); appears top-right by default; inline action buttons supported | — |
| GNOME Shell | gnome-shell built-in; appears top-center; limited action support | — |
| Mako (wlroots) | Stacked notifications top-right by default; supports actions if config allows | — |
| Dunst | Lightweight; respects `~/.config/dunst/dunstrc`; actions via keybinds | — |
| swaync (Sway) | Notification center + popups | — |
| Niri | Compositor-provided; usually a portable daemon (mako, dunst) | — |
## Notification persistence
| Element | Expected | Notes |
|---------|----------|-------|
| Notification history | DE-dependent (KDE has notification panel; GNOME has Calendar drawer; mako/dunst can be configured) | Don't rely on persistence — assume fire-and-forget |
| Do-not-disturb mode | Respect DE's DND state | If user has DND on, notifications shouldn't fire — verify the daemon honors this |
## Failure modes to watch for
| Symptom | Likely cause | Diagnose with |
|---------|--------------|---------------|
| No notifications appear | No daemon running; service not registered | `gdbus call --session --dest=org.freedesktop.Notifications --object-path=/org/freedesktop/Notifications --method=org.freedesktop.DBus.Introspectable.Introspect`; `notify-send "test"` from terminal |
| Notification fires but no icon | Icon path resolution failed; theme strip | Inspect the dbus call body for `app_icon` value |
| Click does nothing | Action handler IPC missed; window already focused | Click while main window is hidden — does it appear? |
| Title/body cut off | DE truncation policy | Test with shorter strings to confirm content vs. layout |
| Notifications fire even in DND | Daemon ignoring DND, or our app sets `urgency=critical` inappropriately | Check `urgency` hint in the dbus call |
| Notification persists indefinitely | `expire_timeout=-1` (never) used inappropriately | Confirm timeout passed in the dbus call |
| Per-source duplicates | Multiple subscribers to the same event | Diagnose by isolating one source at a time |

View File

@@ -0,0 +1,76 @@
# UI — Code Tab Prompt Area
The prompt input area is where users type messages, attach files, pick model and permission mode, and trigger send/stop. Related functional tests: [T18](../cases/code-tab-foundations.md#t18--drag-and-drop-files-into-prompt), [T32](../cases/code-tab-workflow.md#t32--slash-command-menu).
## Text input
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Input field | Bottom center of session pane | Single-line on focus, expands to multi-line as user types | — |
| Placeholder text | Empty state | Helpful hint ("Type to message Claude...") | — |
| Cursor caret | Inside input | Blinks; visible against any background | — |
| Multi-line autosize | Type a long message | Input grows up to a max height, then scrolls | — |
| Word wrap | Long text | Wraps at field width without horizontal scroll | — |
| Paste plain text | `Ctrl+V` after copying text | Inserts at cursor | — |
| Paste image | `Ctrl+V` after copying an image | Attaches as file (see attachments below) | — |
| `Enter` to send | Press Enter | Submits prompt | — |
| `Shift+Enter` for newline | Press Shift+Enter | Inserts newline, doesn't submit | — |
| `Esc` | Press Esc when prompt has content | DE-dependent; typically does nothing in input | — |
| IME composition | Compose a CJK character | Composition UI renders correctly above the input | Fcitx5/IBus integration |
## Attachments
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Attachment button | Left of input (paperclip icon) | Click opens native file chooser | Wayland: portal-backed |
| File-attached chip | Above or inside input | Shows filename + remove (X) button | — |
| Multiple attachments | Attach 3+ files | Each shows as a separate chip; stacked if needed | — |
| Image preview thumbnail | Image attachments | Shows small thumbnail | — |
| PDF preview | PDF attachments | Shows generic PDF icon + filename | — |
| Drag-drop overlay | Drag a file from file manager into the prompt | Overlay highlight indicates drop zone; release attaches ([T18](../cases/code-tab-foundations.md#t18--drag-and-drop-files-into-prompt)) | — |
| `@filename` autocomplete | Type `@` in prompt | Dropdown shows matching project files | Local and SSH only |
## `+` menu (skills, plugins, connectors)
| Element | Position in menu | Expected | Notes |
|---------|------------------|----------|-------|
| `+` button | Adjacent to attachment button | Click opens menu | — |
| **Slash commands** entry | Top of menu | Opens slash command picker (same as typing `/`) | Crosses with [T32](../cases/code-tab-workflow.md#t32--slash-command-menu) |
| **Skills** entry | Mid-menu | Opens skill browser | — |
| **Connectors** entry | Mid-menu | Opens connector picker / status | Crosses with [T34](../cases/code-tab-handoff.md#t34--connector-oauth-round-trip) |
| **Plugins** entry | Mid-menu | Opens installed plugin list | Crosses with [T11](../cases/extensibility.md#t11--plugin-install-anthropic--partners), [T33](../cases/extensibility.md#t33--plugin-browser) |
| **Add plugin** subentry | Under Plugins | Opens plugin browser | — |
## Slash menu (triggered by typing `/`)
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Menu container | Above prompt input | Modal-like overlay, scrollable | — |
| Built-in commands section | Top of list | Lists `/btw`, `/compact`, etc. | — |
| Project skills section | Mid-list | Lists skills from `.claude/skills/` | — |
| User skills section | Mid-list | Lists skills from `~/.claude/skills/` | — |
| Plugin skills section | Bottom-list | Lists skills from installed plugins | — |
| Filter by typing | Type after `/` | Narrows the list | — |
| Selected item insertion | `Enter` or click | Inserts highlighted token in prompt | — |
| `Esc` to dismiss | Press Esc | Closes menu, keeps `/` typed | — |
## Pickers next to send button
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Model picker | Right of input | Dropdown of Sonnet, Opus, Haiku (per current plan availability) | `Cmd+Shift+I` opens |
| Permission mode picker | Right of input | Dropdown of Ask, Auto accept, Plan, Auto, Bypass | `Cmd+Shift+M` opens |
| Effort picker (when applicable) | Right of input | Dropdown of effort levels for adaptive-reasoning models | `Cmd+Shift+E` opens |
| Send button | Far right | Click submits prompt | — |
| Stop button | Replaces Send while Claude responding | Click interrupts current response | `Esc` shortcut equivalent |
| Usage ring | Adjacent to model picker | Shows context window usage + plan usage | Click for details |
## Failure modes to watch for
| Symptom | Likely cause | Notes |
|---------|--------------|-------|
| Drag-drop overlay doesn't appear | Electron drag-drop event not firing on Wayland | Try X11 fallback to isolate |
| `@filename` autocomplete returns empty | Project-folder access not granted; folder picker [T17](../cases/code-tab-foundations.md#t17--folder-picker-opens) failed silently | Verify env pill shows the right folder |
| Slash menu shows wrong skills | Settings shared between desktop and CLI ([T36](../cases/extensibility.md#t36--hooks-fire), [T37](../cases/extensibility.md#t37--claudemd-memory-loads)) | Check `~/.claude/skills/` content vs what's listed |
| Send button greyed out unexpectedly | Permission mode or model not loaded | Refresh; check model dropdown |
| IME composition broken | Electron IME pipeline regression | Test with simpler Electron app |

View File

@@ -0,0 +1,49 @@
# UI — Quick Entry Popup
The Quick Entry popup is the global-shortcut-triggered prompt overlay. Related functional tests: [T06](../cases/shortcuts-and-input.md#t06--quick-entry-global-shortcut-unfocused), [S09](../cases/shortcuts-and-input.md#s09--quick-window-patch-runs-only-on-kde-post-406-gate), [S10](../cases/shortcuts-and-input.md#s10--quick-entry-popup-is-transparent-no-opaque-square-frame), [S29](../cases/shortcuts-and-input.md#s29--quick-entry-popup-is-created-lazily-on-first-shortcut-press-closed-to-tray-sanity), [S33](../cases/shortcuts-and-input.md#s33--quick-entry-transparent-rendering-tracked-against-bundled-electron-version), [S35](../cases/shortcuts-and-input.md#s35--quick-entry-popup-position-is-persisted-across-invocations-and-across-app-restarts), [S36](../cases/shortcuts-and-input.md#s36--quick-entry-popup-falls-back-to-primary-display-when-saved-monitor-is-gone), [S37](../cases/shortcuts-and-input.md#s37--quick-entry-popup-remains-functional-after-main-window-destroy).
## Window appearance
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Window frame | None (frameless popup) | No OS-titlebar; no close/min/max buttons | Upstream sets `frame: false` on the BrowserWindow (`index.js:515381`) |
| Background | Behind prompt UI | Transparent (no opaque square frame visible) on KDE Plasma Wayland ([S10](../cases/shortcuts-and-input.md#s10--quick-entry-popup-is-transparent-no-opaque-square-frame)) | Upstream already sets both `transparent: true` and `backgroundColor: "#00000000"` (`index.js:515380, 515383`). #370 regression is below the option-passing layer (Electron 41.0.4 CSD rework). KDE-W: pending; bug if opaque |
| Rounded corners | Outer edge of UI | Visible | Compositor must support corner rounding via shaders / clip mask |
| Drop shadow | Around popup | macOS-only at the Electron level; on Linux/Windows depends entirely on compositor | Upstream sets `hasShadow: Zr` where `Zr === process.platform === "darwin"` (`index.js:515384`). Linux is expected to render via compositor shadow support; wlroots without server-side decorations will not show one |
| Position | Last-saved position, keyed on monitor; falls back to primary display if monitor is gone | Popup remembers its position across invocations and across app restarts ([S35](../cases/shortcuts-and-input.md#s35--quick-entry-popup-position-is-persisted-across-invocations-and-across-app-restarts), [S36](../cases/shortcuts-and-input.md#s36--quick-entry-popup-falls-back-to-primary-display-when-saved-monitor-is-gone)) | Upstream uses `an.get("quickWindowPosition")` (`index.js:515491-515526`) keyed on monitor label + resolution. Falls back to `cHn()` (`:515502`) when the saved monitor is gone. **Upstream does NOT place on cursor display or focused-window display** — it's last-position or primary, nothing else |
| Always-on-top | Window manager hint | Stays above other windows | Upstream sets `alwaysOnTop: true` with level `"pop-up-menu"` (`index.js:515399`). On macOS this is per-app; on Linux compositors the level hint is interpreted variably |
| Lifecycle | Lazy-created on first shortcut press | First shortcut press constructs the BrowserWindow; subsequent presses reuse it ([S29](../cases/shortcuts-and-input.md#s29--quick-entry-popup-is-created-lazily-on-first-shortcut-press-closed-to-tray-sanity)) | Upstream `if (!Ko \|\| ...) Ko = new BrowserWindow(...)` near `index.js:515375`. Means popup works in tray-only state with no main window mapped |
| Persistence after main window destroy | Popup survives `mainWindow.destroy()` | Popup remains functional; submit guards skip show/focus when `ut` is destroyed ([S37](../cases/shortcuts-and-input.md#s37--quick-entry-popup-remains-functional-after-main-window-destroy)) | Upstream `!ut \|\| ut.isDestroyed()` guard at `index.js:515595`. Likely unreachable on this project due to hide-to-tray override of X button |
## Input area
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Text input field | Center of popup | Receives focus immediately on open; cursor blinks | — |
| Placeholder text | Empty input state | Shows guidance like "Ask Claude anything..." | — |
| Multi-line autosize | Type a long prompt | Input grows downward as text wraps; popup grows with it | — |
| `Enter` to submit | Press Enter | Sends prompt, closes popup. Prompt must be > 2 chars trimmed (`index.js:515530, 515533`); 1-2 char prompts are silently dropped | Renderer-side keymap; reaches main process via IPC `requestDismissWithPayload()` (`:515409`) |
| `Shift+Enter` for newline | Press Shift+Enter | Inserts newline, doesn't submit | Renderer-side |
| `Esc` to dismiss | Press Esc | Closes popup without submitting | Renderer-side; reaches main process via IPC `requestDismiss()` (`:515409`) |
| Click outside | Click outside the popup window | Closes popup without submitting | Wired in **main process** via the popup's `blur` handler (`Ko.on("blur", () => g3A(null))` at `index.js:515465`) |
| Paste behavior | Paste rich text | Text-only paste; no HTML residue | — |
| IME / dead-key composition | Type composed characters | Composition UI renders correctly above the input | Fcitx5/IBus integration is fragile under Electron |
## Submit feedback
| Element | Trigger | Expected | Notes |
|---------|---------|----------|-------|
| Submit transition | Press Enter | Popup closes; main window navigates to a **new** chat session ([S31](../cases/shortcuts-and-input.md#s31--quick-entry-submit-makes-the-new-chat-reachable-from-any-main-window-state)). Quick Entry never appends to existing chats — `ynt(e)` at `index.js:515546` always creates new | Upstream calls `mainWin.show()` + `mainWin.focus()` only — no `restore()`, no workspace migration. Behavior on minimized / hidden / cross-workspace main is compositor-dependent |
| Loading indicator | While prompt is in flight | Brief spinner or fade-out — popup should not appear frozen | — |
| Error state | Submit when offline / API error | Inline error message; popup stays open so user can retry | — |
## Failure modes to watch for
| Symptom | Likely cause | Diagnose with |
|---------|--------------|---------------|
| Popup doesn't appear when shortcut pressed | Global shortcut not registered ([T06](../cases/shortcuts-and-input.md#t06--quick-entry-global-shortcut-unfocused), [S11](../cases/shortcuts-and-input.md#s11--quick-entry-shortcut-fires-from-any-focus-on-wayland-mutter-xwayland-key-grab), [S14](../cases/shortcuts-and-input.md#s14--global-shortcuts-via-xdg-portal-work-on-niri)) | Launcher log; portal `BindShortcuts` outcome |
| Opaque square frame visible behind UI | Transparent background not respected ([S10](../cases/shortcuts-and-input.md#s10--quick-entry-popup-is-transparent-no-opaque-square-frame)) | KDE compositor settings; BrowserWindow `transparent: true` arg |
| Popup appears but input doesn't auto-focus | Focus stealing prevention by compositor; race in BrowserWindow `show()` + `focus()` | Wayland focus-request semantics; mutter is most strict |
| IME composition cursor renders in wrong place | Electron IME integration bug | Try with simpler GTK app to isolate; report upstream Electron issue if reproducible |
| Popup persists after submit | Close-on-submit IPC missed | Launcher log; DevTools console (if reachable on the popup window) |
| Popup appears on wrong monitor / wrong workspace | Compositor places frameless windows differently | Test with `xdotool getactivewindow` (X11) before/after |

View File

@@ -0,0 +1,72 @@
# UI — Routines Page
The Routines page hosts the list of scheduled tasks (local and remote), the new-routine form, and per-routine detail views. Related functional tests: [T26](../cases/routines.md#t26--routines-page-renders), [T27](../cases/routines.md#t27--scheduled-task-fires-and-notifies), [T28](../cases/routines.md#t28--scheduled-task-catch-up-after-suspend).
## Routines list
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Routines page link | Code-tab sidebar | Click opens the page ([T26](../cases/routines.md#t26--routines-page-renders)) | — |
| Page header | Top of page | Title "Routines" + description | — |
| **New routine** button | Top-right of page | Click shows Local / Remote selector | — |
| Routines list | Page body | Lists all configured routines | — |
| Per-routine row | List item | Name, schedule summary, last-run timestamp, status indicator | — |
| Run-now icon | Per row, hover-revealed | Click triggers immediate run ([T27](../cases/routines.md#t27--scheduled-task-fires-and-notifies)) | — |
| Pause / resume toggle | Per row | Pauses or resumes scheduled runs without deleting | — |
| Click row | Per row | Opens routine detail page | — |
## New routine form (Local)
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Routine type selector | Top of form | Local / Remote tabs or radio | — |
| **Name** field | Top of form | Required; converted to lowercase kebab-case for filesystem | — |
| **Description** field | Below name | Optional one-liner shown in list | — |
| **Instructions** textarea | Mid-form | Rich textarea for the prompt | — |
| Permission mode picker | Within Instructions area | Same options as session: Ask, Auto accept, Plan, Auto, Bypass | — |
| Model picker | Within Instructions area | Sonnet, Opus, Haiku per plan | — |
| **Working folder** picker | Below Instructions | Required; opens native file chooser | If folder not yet trusted, app prompts to trust |
| **Worktree** toggle | Below folder | When ON, each run gets its own isolated worktree | — |
| **Schedule** preset | Bottom of form | Manual / Hourly / Daily / Weekdays / Weekly | — |
| Time picker | Visible for Daily, Weekdays, Weekly | Defaults to 9:00 AM local | — |
| Day picker | Visible for Weekly only | Day-of-week selector | — |
| **Save** button | Bottom-right | Disabled until required fields filled | — |
| **Cancel** button | Bottom-left | Discards form, returns to list | — |
| Folder-trust prompt | Triggered when folder not trusted | Modal asking to trust the selected folder | Required before save |
## New routine form (Remote)
Per upstream docs, remote routines run on Anthropic-managed cloud infrastructure. The form has additional fields for connectors and trigger types (cron, API, GitHub event). On Linux, the Remote tab should function identically to other platforms.
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Trigger type selector | Top of form | Schedule / API call / GitHub event | — |
| Connectors picker | Per-routine basis (remote) | Configures connectors at routine creation | — |
| Network access controls | If applicable | Tied to cloud environment config | — |
## Routine detail page
Per upstream docs.
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| **Run now** button | Top of page | Starts the task immediately | — |
| Status toggle (Active / Paused) | Top of page | Pauses or resumes without deleting | — |
| **Edit** button | Top of page | Opens the same form populated with current values | — |
| **Delete** button | Top of page (or footer) | Removes routine; archives all sessions it created | Confirmation dialog expected |
| **Review history** section | Page body | Lists every past run with timestamp and status | — |
| Per-history-entry hover | Hover skipped runs | Tooltip explains why skipped (asleep, prior run still running, other concurrent task) | — |
| **Show more** button | Bottom of history | Loads older entries | — |
| **Always allowed** panel | Page body | Lists tools auto-approved for this routine | — |
| Revoke approval | Per-tool entry | Removes the auto-approval | — |
## Failure modes to watch for
| Symptom | Likely cause | Notes |
|---------|--------------|-------|
| Folder-trust modal doesn't appear | Trust state cached incorrectly | Clear `~/.claude/trusted-folders` (or equivalent) and retry |
| Save button never enables | Required fields validation regression | DevTools console |
| Time picker truncates / clips | Modal sizing on small viewports | Resize Settings window to reproduce |
| History tooltips don't render | Tooltip component regression | — |
| Run-now does nothing | Task runner thread not started | Launcher log; `pgrep -af claude` for runner subprocess |
| Routines page blank | Code-tab failure ([T16](../cases/code-tab-foundations.md#t16--code-tab-loads)) cascading | Confirm Code tab itself loads first |

View File

@@ -0,0 +1,87 @@
# UI — Settings
The Settings window holds Desktop app preferences, Claude Code settings, connector management, and account controls. Related functional tests: [S20](../cases/routines.md#s20--keep-computer-awake-inhibits-idle-suspend), [S22](../cases/platform-integration.md#s22--computer-use-toggle-is-absent-or-visibly-disabled-on-linux), [T30](../cases/code-tab-workflow.md#t30--auto-archive-on-pr-merge).
## Settings root
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Settings window | Opened via app menu, tray menu, or in-app shortcut | Window opens with sidebar nav and content area | — |
| Window close button | Top-right (or top-left on GNOME) | Closes settings; main app continues running | — |
| Sidebar nav | Left of window | Lists every settings page | — |
## Desktop app → General
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| **Computer use** toggle | Top of page | Either absent on Linux, or rendered disabled with a "not supported on Linux" hint ([S22](../cases/platform-integration.md#s22--computer-use-toggle-is-absent-or-visibly-disabled-on-linux)) | Critical: must not appear functional |
| **Keep computer awake** toggle | Mid-page | Toggles `systemd-inhibit --what=idle:sleep` lock ([S20](../cases/routines.md#s20--keep-computer-awake-inhibits-idle-suspend)) | Verify with `systemd-inhibit --list` |
| **Denied apps** list | Computer-use related | Likely absent on Linux (computer use unsupported) | — |
| **Unhide apps when Claude finishes** toggle | Computer-use related | Likely absent on Linux | — |
| Theme picker (if exposed) | Mid-page | System / Light / Dark | Tray icon should respond ([S08](../cases/tray-and-window-chrome.md#s08--tray-icon-doesnt-duplicate-after-nativetheme-update)) |
## Desktop app → Account
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Account name / email | Top of page | Reflects signed-in identity | — |
| Plan badge | Below name | Shows Pro / Max / Team / Enterprise | — |
| Sign out button | Bottom of page | Signs out cleanly; subsequent launches show sign-in screen | — |
## Claude Code
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| **Worktree location** | Top of page | Default: `<project-root>/.claude/worktrees/`. Editable to a custom directory | Crosses with [T29](../cases/code-tab-workflow.md#t29--worktree-isolation) |
| **Branch prefix** | Mid-page | Optional prefix prepended to every worktree branch | — |
| **Auto-archive after PR merge or close** toggle | Mid-page | When ON, sessions archive on PR resolution ([T30](../cases/code-tab-workflow.md#t30--auto-archive-on-pr-merge)) | — |
| **Persist preview sessions** toggle | Mid-page | Toggles cookies/localStorage persistence in Preview pane | Crosses with [T21](../cases/code-tab-workflow.md#t21--dev-server-preview-pane) |
| **Preview** toggle | Mid-page | When OFF, preview pane and auto-verify are disabled | — |
| **Allow bypass permissions mode** toggle | Mid-page | When ON, exposes Bypass mode in mode picker | Enterprise admins can disable |
| **Auto** mode availability | Mid-page | Research preview; not on Pro plans | Per upstream docs |
## Connectors
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Connectors list | Page content | Lists connected services with status | Crosses with [T34](../cases/code-tab-handoff.md#t34--connector-oauth-round-trip) |
| Per-connector entry | List row | Name, last-connected timestamp, manage / disconnect buttons | — |
| **Manage** button | Per row | Opens connector-specific settings | — |
| **Disconnect** button | Per row | Revokes access; connector becomes unusable in subsequent sessions | — |
| **Add connector** button | Top of page | Opens the connector picker (same surface as `+ → Connectors`) | — |
## SSH connections
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| SSH connections list | Page content | Lists user-added + managed (read-only) connections | — |
| **Add SSH connection** button | Top of page | Opens dialog with Name / SSH Host / SSH Port / Identity File fields | — |
| Per-connection entry | List row | Edit / delete (user-added) or "Managed" badge (admin-distributed) | — |
## Keyboard shortcuts
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Shortcut list | Page content | Tabular list of all configurable shortcuts | — |
| Shortcut value | Per row | Click to rebind; shows current binding | — |
| Reset to default | Per row | Reverts to upstream default | — |
| Quick Entry shortcut | Specifically called out | Default `Ctrl+Alt+Space`; rebind here | Crosses with [T06](../cases/shortcuts-and-input.md#t06--quick-entry-global-shortcut-unfocused) |
## Local environment editor
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Env editor open | Environment dropdown → Local → gear icon | Opens encrypted env-var editor | Crosses with [S18](../cases/platform-integration.md#s18--local-environment-editor-persists-across-reboot) |
| Add variable | In editor | Name + value fields; save | — |
| Remove variable | Per row | Deletes the variable | — |
| **Apply to dev servers** indicator | Near save | Confirms vars also reach preview servers | — |
## Failure modes to watch for
| Symptom | Likely cause | Notes |
|---------|--------------|-------|
| Computer-use toggle visible and toggleable on Linux | [S22](../cases/platform-integration.md#s22--computer-use-toggle-is-absent-or-visibly-disabled-on-linux) regression | File a bug; users will be misled |
| Keep-computer-awake toggle has no effect | `systemd-inhibit` integration not wired ([S20](../cases/routines.md#s20--keep-computer-awake-inhibits-idle-suspend)) | Verify lock list before/after |
| Worktree location field rejects valid paths | Path validation too strict; absolute vs `~`-prefixed | Check both forms |
| SSH connection list missing managed entries | Managed-settings file not loaded; admin distribution failed | Confirm file exists at expected path |
| Env editor not encrypting | Linux secret-store not wired ([S18](../cases/platform-integration.md#s18--local-environment-editor-persists-across-reboot)) | `secret-tool search`; `kwallet5-query` |

View File

@@ -0,0 +1,55 @@
# UI — Code Tab Sidebar
The sidebar lists Code-tab sessions, lets you filter, group, archive, and rename. Related functional tests: [T29](../cases/code-tab-workflow.md#t29--worktree-isolation), [T30](../cases/code-tab-workflow.md#t30--auto-archive-on-pr-merge), [S24](../cases/platform-integration.md#s24--dispatch-spawned-code-session-appears-with-badge-and-notification).
## Top controls
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| **+ New session** button | Top of sidebar | Click opens a new session against the currently selected env. `Ctrl+N` shortcut equivalent | — |
| **Routines** link | Top of sidebar | Click opens the Routines page ([T26](../cases/routines.md#t26--routines-page-renders)) | — |
| **Customize** link | Top of sidebar | Click opens connectors / skills / plugins manager | — |
| Filter: status | Top of session list | Dropdown / tabs filter by Active / Archived / All | — |
| Filter: project | Top of session list | Dropdown filters by project (multi-select) | — |
| Filter: environment | Top of session list | Dropdown filters by Local / Remote / SSH / All | — |
| Group-by control | Top of session list | Toggle between flat list and grouped-by-project | — |
## Session row
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Session title | Row content | Shows session name (auto-generated or user-renamed) | Click row → switches to that session |
| Session status indicator | Left of title or as colored dot | Reflects state: idle, running, awaiting-approval, errored, archived | — |
| Project / branch label | Below title | Shows project folder name + branch | — |
| Diff stats badge (e.g. `+12 -1`) | Right of title | Visible when session has uncommitted changes | Click → opens diff view |
| **Dispatch** badge | Top-right of row | Visible on Dispatch-spawned sessions ([S24](../cases/platform-integration.md#s24--dispatch-spawned-code-session-appears-with-badge-and-notification)) | — |
| **Scheduled** badge | Top-right of row | Visible on scheduled-task-spawned sessions ([T27](../cases/routines.md#t27--scheduled-task-fires-and-notifies)) | Sessions group under "Scheduled" header |
| Hover archive icon | Right side, on row hover | Click archives the session and removes its worktree | — |
| Right-click context menu | Right-click on row | Standard menu: Rename, Archive, Open in Files, Copy path | — |
| Active session highlight | Selected row | Visually distinct from inactive rows | — |
## Sidebar layout
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Sidebar resize handle | Right edge of sidebar | Drag to resize; double-click to reset width | — |
| Sidebar collapse toggle | Top of sidebar (hamburger or arrow) | Collapse to icons-only or hide entirely | Crosses with topbar hamburger |
| Scrollbar | Right edge when content exceeds height | Renders, drags work | Theme-aware |
## Cycling shortcuts
| Shortcut | Expected | Notes |
|----------|----------|-------|
| `Ctrl+Tab` | Cycle to next session | Per upstream docs |
| `Ctrl+Shift+Tab` | Cycle to previous session | Per upstream docs |
| `Cmd+Shift+]` / `Cmd+Shift+[` | Same as above on macOS | N/A on Linux unless rebound |
## Failure modes to watch for
| Symptom | Likely cause | Notes |
|---------|--------------|-------|
| Sidebar doesn't render | Code tab failed to load ([T16](../cases/code-tab-foundations.md#t16--code-tab-loads)) | Check DevTools console |
| Sessions appear but clicking does nothing | IPC between sidebar and session pane broken | Launcher log, DevTools console |
| Hover archive icon never appears | CSS hover state mis-applied; touch device might be assumed | Inspect element; check pointer events |
| Dispatch / Scheduled badges missing | Feature flag or state not reaching the renderer | Check session metadata in launcher log |
| Auto-archive doesn't fire | Session-archive logic bug ([T30](../cases/code-tab-workflow.md#t30--auto-archive-on-pr-merge)) | Confirm setting enabled; check PR state via `gh pr view` |

44
docs/testing/ui/tray.md Normal file
View File

@@ -0,0 +1,44 @@
# UI — System Tray
Tray icon, menu, and theme variants. See [`../cases/tray-and-window-chrome.md`](../cases/tray-and-window-chrome.md) for related functional tests ([T03](../cases/tray-and-window-chrome.md#t03--tray-icon-present), [S08](../cases/tray-and-window-chrome.md#s08--tray-icon-doesnt-duplicate-after-nativetheme-update)).
## Tray icon
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Tray icon (light theme) | System tray / status area | Black icon (the "Template" variant) renders cleanly on a light tray | — |
| Tray icon (dark theme) | System tray / status area | White icon (the "Template-Dark" variant) renders cleanly on a dark tray | — |
| Theme switch | Trigger system theme change | Icon updates in place — no duplicate icons spawned ([S08](../cases/tray-and-window-chrome.md#s08--tray-icon-doesnt-duplicate-after-nativetheme-update)) | KDE-W ✓ via in-place fast-path |
| Icon resolution / sharpness | Inspect at native scale | Icon is crisp, not pixelated. Check on HiDPI screens | — |
| Position | Tray area | Appears among other SNI/tray icons | KDE Plasma sorts alphabetically by ID; adjusting position requires user config |
| Tooltip on hover | Hover over icon | Shows "Claude" or app name | — |
## Right-click menu
| Element | Position in menu | Expected | Notes |
|---------|------------------|----------|-------|
| Show / Hide window | Top item | Toggles main window visibility | Label may change between "Show" and "Hide" based on state |
| Quick Entry | Mid-menu | Opens Quick Entry popup ([T06](../cases/shortcuts-and-input.md#t06--quick-entry-global-shortcut-unfocused)) | — |
| Open at Login (toggle) | Mid-menu | Reflects current XDG autostart state ([T09](../cases/platform-integration.md#t09--autostart-via-xdg)) | Toggle should write `~/.config/autostart/*.desktop` |
| Settings | Mid-menu | Opens Settings window | — |
| About | Bottom area | Opens About dialog | — |
| Quit | Bottom item | Fully exits the app (no hide-to-tray) | — |
| Menu separators | Between item groups | Render cleanly | — |
## Left-click behavior
| Element | Trigger | Expected | Notes |
|---------|---------|----------|-------|
| Single left-click | Click tray icon once | Toggles main window visibility | KDE-W ✓ |
| Double left-click | Click twice quickly | DE-dependent; should not spawn duplicate windows | — |
| Middle-click | Middle mouse button on tray icon | DE-dependent (no documented behavior); should not crash | — |
## Failure modes to watch for
| Symptom | Likely cause | Diagnose with |
|---------|--------------|---------------|
| Tray icon never appears | No SNI watcher (e.g. GNOME without AppIndicator extension); Electron fallback to legacy XEmbed not registered | `gdbus call ... org.kde.StatusNotifierWatcher` — see [runbook](../runbook.md#tray--dbus-state-kde) |
| Two tray icons after theme switch | Tray rebuild race ([S08](../cases/tray-and-window-chrome.md#s08--tray-icon-doesnt-duplicate-after-nativetheme-update)) | SNI watcher state before/after; [`docs/learnings/tray-rebuild-race.md`](../../learnings/tray-rebuild-race.md) |
| Icon renders as a generic placeholder | Icon path resolution failed; theme mismatch | Check Electron `Tray` constructor args; check `~/.cache/claude-desktop-debian/launcher.log` |
| Menu items don't respond | IPC bridge to tray menu broken; main process busy | Click main window — does the rest of the app respond? `pgrep -af claude`; main process state |
| Tray icon disappears after some time | Tray daemon restarted; Claude didn't re-register | KDE Plasma: restart `plasmashell`; observe whether icon comes back without restarting Claude |

View File

@@ -0,0 +1,58 @@
# UI — Window Chrome & Tabs
OS-level window frame plus the in-app tab strip and (PR #538) hybrid in-app topbar. See [`../cases/tray-and-window-chrome.md`](../cases/tray-and-window-chrome.md) for related functional tests.
## OS window frame
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Title bar | Top of window | Drawn by DE/compositor; shows app title; right-click opens window menu | KDE-W ✓; Hypr-N ✓ |
| Close button (X) | Top-right (or top-left on GNOME) | Renders, hover state visible, click hides-to-tray ([T08](../cases/tray-and-window-chrome.md#t08--hide-to-tray-on-close)) | — |
| Minimize button | Adjacent to close | Renders, hover state visible, click minimizes | — |
| Maximize / restore button | Adjacent to minimize | Renders, hover state visible, click toggles maximize | — |
| Resize edges (left, right, top, bottom, corners) | Window perimeter | Cursor changes to resize affordance on hover; drag resizes | Wlroots compositors may not show cursor change |
| Window menu (right-click titlebar) | Right-click anywhere on titlebar | Standard window menu (Move, Resize, Close, Always on Top, etc.) | DE-dependent |
## Hybrid in-app topbar (PR #538 builds)
Sits below the OS frame in hybrid mode. Crosses with [T07](../cases/tray-and-window-chrome.md#t07--in-app-topbar-renders--clickable) and [S13](../cases/tray-and-window-chrome.md#s13--hybrid-topbar-shim-survives-omarchys-ozone-wayland-env-exports).
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| Hamburger menu | Top-left of topbar | Renders, click opens sidebar | — |
| Sidebar toggle | Adjacent to hamburger | Renders, click collapses/expands sidebar | — |
| Search icon | Center-left | Renders, click opens search overlay | — |
| Back arrow | Center | Renders, greyed out when no history; click navigates back | — |
| Forward arrow | Adjacent to back | Same as back, but for forward history | — |
| Cowork ghost icon | Right of nav arrows | Renders, click opens Cowork tab | The icon is the canonical "is the topbar shim alive" indicator |
| Drag region (gaps between buttons) | Empty space between elements | Drag region behaves correctly — buttons remain clickable, no implicit drag region capturing button clicks | Critical: this is the regression mode in [T07](../cases/tray-and-window-chrome.md#t07--in-app-topbar-renders--clickable) |
## Tab strip (Chat / Cowork / Code)
Sits in the topbar (hybrid) or in the OS-frame area (legacy). Top center.
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| **Chat** tab | Left tab | Renders, click switches to Chat | — |
| **Cowork** tab | Center tab | Renders, click switches to Cowork; ghost icon may indicate Dispatch state | — |
| **Code** tab | Right tab | Renders, click switches to Code; on Linux, may show 403 / sign-in upsell ([T16](../cases/code-tab-foundations.md#t16--code-tab-loads)) | — |
| Active tab indicator | Underline / fill on active tab | Visually distinct from inactive tabs | — |
| Tab badges (e.g. unread count, Dispatch badge) | Top-right of each tab | Render when applicable, dismiss when state clears | — |
## Other window-level UI
| Element | Location | Expected | Notes |
|---------|----------|----------|-------|
| About dialog | App menu → About | Modal opens with app version, Electron version, license info; close button works | — |
| App menu (macOS-style) | macOS only — N/A on Linux | Not present on Linux; menu items are in window menu instead | — |
| Update prompt | Triggered by upstream update detection | On DEB/RPM, auto-update path is suppressed ([S26](../cases/distribution.md#s26--auto-update-is-disabled-when-installed-via-apt--dnf)). On AppImage, may surface a prompt | — |
| Crash report dialog | Shown after a crash | Dialog explains what happened, offers to file an issue | Capture for Linux specifics — wording may reference macOS Console / Windows Event Viewer paths only |
## Display-server cross-cuts
| Concern | X11 | Wayland (mutter) | Wayland (KWin) | Wayland (wlroots) |
|---------|-----|-------------------|----------------|---------------------|
| HiDPI scaling | `--force-device-scale-factor=N` works | Auto via fractional scaling | Auto via fractional scaling | Auto where compositor supports it |
| Drag-to-snap (Aero-style) | Works under most WMs | mutter snaps | KWin snaps | Compositor-dependent |
| Always-on-top | Window menu | Window menu | Window menu | Compositor-dependent |
| Cursor theme | Inherits from `gtk-cursor-theme-name` | Same | Same | Same |

5
tools/test-harness/.gitignore vendored Normal file
View File

@@ -0,0 +1,5 @@
node_modules/
results/
*.log
.DS_Store
package-lock.json

View File

@@ -0,0 +1,480 @@
# Linux Compatibility Test Harness
In-VM (or on-host) Playwright + DBus runner for the test cases under
[`docs/testing/cases/`](../../docs/testing/cases/). See
[`docs/testing/automation.md`](../../docs/testing/automation.md) for the
architecture, decisions, and rationale.
## Status
Seventy-four specs wired (36 cross-env T-tests, 33 env-specific S-tests,
5 H-prefix harness self-tests). See
[`docs/testing/runner-implementation-plan.md`](../../docs/testing/runner-implementation-plan.md)
for the tiered triage of remaining tests and the per-spec rationale
behind tier classification.
| Test | What it checks | Layer |
|------|----------------|-------|
| [T01](../../docs/testing/cases/launch.md#t01--app-launch) | X11 window with our pid appears within 15s; title matches `/claude/i` | L2 (xprop) |
| [T02](../../docs/testing/cases/launch.md#t02--doctor-health-check) | `claude-desktop --doctor` exits 0 | spawn probe |
| [T03](../../docs/testing/cases/tray-and-window-chrome.md#t03--tray-icon-present) | A `StatusNotifierItem` is registered by the claude-desktop pid AND exactly one (no rebuild-race duplicates) | L2 (DBus) |
| [T04](../../docs/testing/cases/tray-and-window-chrome.md#t04--window-decorations-draw) | Window has `_NET_FRAME_EXTENTS` (sum > 0) and a "Claude" title | L2 (xprop) |
| [T05](../../docs/testing/cases/shortcuts-and-input.md#t05--claude-url-handler) | `xdg-open 'claude://...'` delivers via `app.on('second-instance')` to the running app | spawn + L1 hook |
| [T06](../../docs/testing/cases/shortcuts-and-input.md#t06--quick-entry-global-shortcut-unfocused) | `globalShortcut.isRegistered('Ctrl+Alt+Space')` returns true after `mainVisible` | L1 |
| [T07](../../docs/testing/cases/tray-and-window-chrome.md#t07--in-app-topbar) | Five topbar buttons render with non-zero rects (uses `seedFromHost` for hermetic auth) | L1 + DOM |
| [T08](../../docs/testing/cases/tray-and-window-chrome.md#t08--close-x-hides-to-tray) | `win.close()` fires the wrapper interceptor; window hidden, proc alive | L1 |
| [T09](../../docs/testing/cases/platform-integration.md#t09--autostart-via-xdg) | `setLoginItemSettings({ openAtLogin })` writes/removes `$XDG_CONFIG_HOME/autostart/claude-desktop.desktop` | L1 + filesystem |
| [T10](../../docs/testing/cases/platform-integration.md#t10--cowork-integration) | After H04-style spawn detection, `kill -9` the daemon and confirm a *different* pid respawns within ~20s (Patch 6 cooldown + retry) | pgrep delta + spawn delta |
| [T11](../../docs/testing/cases/extensibility.md#t11--plugin-install) | Plugin-install code path fingerprints present in bundled `index.js` | file probe |
| [T11_runtime](../../docs/testing/cases/extensibility.md#t11--plugin-install) | After `seedFromHost` + `userLoaded`, the install-flow eipc surface (`installPlugin`, `uninstallPlugin`, `updatePlugin`, `listInstalledPlugins`, `LocalPlugins/getPlugins` — five-suffix presence probe) is registered on the claude.ai webContents AND BOTH read-side handlers across the two impl objects are callable through the renderer-side wrapper: `CustomPlugins/listInstalledPlugins([])` returns array shape (drives Manage plugins panel), `LocalPlugins/getPlugins()` returns array shape (reads `~/.claude/plugins/installed_plugins.json` per case-doc :465822) — Tier 2 reframe of T11 (case-doc anchor :507181) | L1 (eipc registry + invoke) |
| [T12](../../docs/testing/cases/platform-integration.md#t12--webgl-warn-only) | `app.getGPUFeatureStatus()` returns a populated object; renderer reached visible | L1 |
| [T13](../../docs/testing/cases/launch.md#t13--doctor-reports-correct-package-format) | `--doctor` does not false-flag rpm/deb installs as missing-dpkg AppImage | spawn + stdout grep |
| [T14a](../../docs/testing/cases/launch.md#t14--multi-instance-behavior) | `requestSingleInstanceLock` + `'second-instance'` strings in bundled `index.js` (file probe) | file probe |
| [T14b](../../docs/testing/cases/launch.md#t14--multi-instance-behavior) | Second invocation under same isolation exits cleanly; primary pid stays alive (runtime probe) | spawn delta + pgrep |
| [T16](../../docs/testing/cases/code-tab-foundations.md#t16--code-tab-loads) | After `seedFromHost` + `userLoaded`, `CodeTab.activate()` resolves and ≥1 compact pill renders (env pill = Code-body mounted) | L1 + AX-tree |
| [T17](../../docs/testing/cases/code-tab-foundations.md#t17--folder-picker-opens) | After `seedFromHost` + `userLoaded`, Code df-pill → env pill → Local → Select folder → Open folder triggers `dialog.showOpenDialog` (mock installed via `installOpenDialogMock`); skips cleanly when host has no signed-in Claude config | L1 + AX-tree |
| [T18](../../docs/testing/cases/code-tab-foundations.md#t18--drag-and-drop-files-into-prompt) | Bundled `mainView.js` preload contains the path-resolution bridge fingerprints: `getPathForFile` (2× — property key + the `webUtils.getPathForFile(` call, both at case-doc :9267), `webUtils`, `filePickers`, and the `claudeAppSettings` `contextBridge.exposeInMainWorld` namespace (case-doc :9552) — pins the load-bearing wiring without faking OS-level XDND drag (xdotool can't put file URIs on the X11 selection; Wayland needs per-compositor IPC + libei) | file probe |
| [T19](../../docs/testing/cases/code-tab-foundations.md#t19--integrated-terminal) | After `seedFromHost` + `userLoaded`, the integrated-terminal eipc surface (`startShellPty`, `writeShellPty`, `stopShellPty`, `resizeShellPty`, `getShellPtyBuffer` — five-suffix presence probe) is registered on the claude.ai webContents AND the foundational `LocalSessions/getAll` returns array shape (Tier 2 reframe of the case-doc T19 case; case-doc anchors are write-side `startShellPty` etc. so reframe asserts the FULL terminal IPC surface registers + a stateless read-side surrogate is invocable) | L1 (eipc registry + invoke) |
| [T20](../../docs/testing/cases/code-tab-foundations.md#t20--file-pane-opens-and-saves) | After `seedFromHost` + `userLoaded`, the file-pane eipc surface (`readSessionFile`, `writeSessionFile`, `pickSessionFile` — three-suffix presence probe) is registered on the claude.ai webContents AND the foundational `LocalSessions/getAll` returns array shape (Tier 2 reframe of the case-doc T20 case; the case-doc's `readSessionFile` anchor is read-side but needs (sessionId, path) args not constructible from a fresh isolation, so the registration probe + foundational `getAll` invocation is the strongest non-destructive Tier 2 layer) | L1 (eipc registry + invoke) |
| [T21](../../docs/testing/cases/code-tab-workflow.md#t21--dev-server-preview-pane) | After `seedFromHost` + `userLoaded`, the preview-pane eipc surface (`getConfiguredServices`, `startFromConfig`, `stopServer`, `getAutoVerify`, `capturePreviewScreenshot` — five-suffix presence probe) is registered on the claude.ai webContents AND BOTH case-doc-anchored read-side handlers are callable through the renderer-side wrapper: `getConfiguredServices(cwd)` returns array shape, `getAutoVerify(cwd)` returns boolean shape (Tier 2 reframe of the case-doc T21 case; cwd validator is `typeof cwd === 'string'` only, smoke-tested session 11) | L1 (eipc registry + invoke) |
| [T22](../../docs/testing/cases/code-tab-workflow.md#t22--pr-monitoring-via-gh) | Bundled `index.js` contains `LocalSessions_$_getPrChecks` eipc channel name *and* `gh CLI not found in PATH` Linux-fallthrough throw site (Tier 1 fingerprint) | file probe |
| [T22b](../../docs/testing/cases/code-tab-workflow.md#t22--pr-monitoring-via-gh) | After `seedFromHost` + `userLoaded`, the `LocalSessions_$_getPrChecks` eipc handler is registered on the claude.ai webContents (`webContents.ipc._invokeHandlers` — Tier 2 runtime probe sibling of T22, strictly stronger than the bundle-string fingerprint) | L1 (eipc registry) |
| [T23](../../docs/testing/cases/code-tab-handoff.md#t23--desktop-notifications-fire) | Firing `new Notification({title})` from main reaches the session bus's `org.freedesktop.Notifications.Notify` (observed via `dbus-monitor`) | L1 + DBus subprocess |
| [T24](../../docs/testing/cases/code-tab-handoff.md#t24--open-in-external-editor) | After `installOpenExternalMock` mirroring T25's pattern, `evalInMain` calls `shell.openExternal('vscode://file/...')`; mock records the URL verbatim, no real editor launch | L1 (mocked egress) |
| [T25](../../docs/testing/cases/code-tab-handoff.md#t25--show-in-files--file-manager) | After `installShowItemInFolderMock` mirroring T17's dialog-mock pattern, `evalInMain` calls `shell.showItemInFolder(<synthetic path>)`; mock records the call verbatim, no throw — no host side effect | L1 (mocked egress) |
| [T26](../../docs/testing/cases/routines.md#t26--routines-page-renders) | After `seedFromHost` + `userLoaded`, click "Routines" sidebar AX button; assert "New routine" / "All" / "Calendar" anchor renders | L1 + AX-tree |
| [T27](../../docs/testing/cases/routines.md#t27--scheduled-task-fires-and-notifies) | After `seedFromHost` + `userLoaded`, both Cowork and CCD `getAllScheduledTasks` eipc handlers are registered AND callable through the renderer-side wrapper, returning array shape — Tier 2 reframe of the case-doc T27 case | L1 (eipc invoke) |
| [T30](../../docs/testing/cases/code-tab-workflow.md#t30--auto-archive-on-pr-merge) | Bundled `index.js` colocates the auto-archive sweep cadence (`300*1e3``3600*1e3``AutoArchiveEngine`) with the `ccAutoArchiveOnPrClose` gate key (single-regex multi-string fingerprint) | file probe |
| [T31](../../docs/testing/cases/code-tab-workflow.md#t31--side-chat-opens) | Bundled `index.js` contains all three side-chat eipc channel names (`startSideChat`, `sendSideChatMessage`, `stopSideChat`) — load-bearing trio | file probe |
| [T31b](../../docs/testing/cases/code-tab-workflow.md#t31--side-chat-opens) | After `seedFromHost` + `userLoaded`, all three side-chat eipc handlers (`startSideChat`, `sendSideChatMessage`, `stopSideChat`) are registered on the claude.ai webContents — load-bearing trio (Tier 2 runtime sibling of T31) | L1 (eipc registry) |
| [T32](../../docs/testing/cases/code-tab-workflow.md#t32--slash-command-menu) | Bundled `index.js` contains `LocalSessions_$_getSupportedCommands` eipc channel + `slashCommands` schema field | file probe |
| [T33](../../docs/testing/cases/extensibility.md#t33--plugin-browser) | Bundled `index.js` contains `CustomPlugins_$_listMarketplaces` and `CustomPlugins_$_listAvailablePlugins` eipc channel names (browser populate flow) | file probe |
| [T33b](../../docs/testing/cases/extensibility.md#t33--plugin-browser) | After `seedFromHost` + `userLoaded`, both plugin-browser eipc handlers (`listMarketplaces`, `listAvailablePlugins`) are registered on the claude.ai webContents — load-bearing pair (Tier 2 runtime sibling of T33) | L1 (eipc registry) |
| [T33c](../../docs/testing/cases/extensibility.md#t33--plugin-browser) | After `seedFromHost` + `userLoaded`, both plugin-browser eipc handlers (`listMarketplaces`, `listAvailablePlugins`) are callable through the renderer-side wrapper with `args = [[]]` (empty `egressAllowedDomains`), each returning array shape — Tier 2 invocation upgrade of T33b, strictly stronger than registration alone | L1 (eipc invoke) |
| [T35](../../docs/testing/cases/extensibility.md#t35--mcp-server-config-picked-up) | Bundled `index.js` contains the four-needle MCP-config separation fingerprint: `claude_desktop_config.json` (chat-tab path), `.claude.json` + `.mcp.json` (Code-tab loaders), `"user","project","local"` (settingSources triple Code-session passes to the agent SDK) — pins per-tab separation without launch | file probe |
| [T35b](../../docs/testing/cases/extensibility.md#t35--mcp-server-config-picked-up) | After `seedFromHost` + `userLoaded`, the `claude.settings/MCP/getMcpServersConfig` eipc handler is registered AND callable through the renderer-side wrapper, returning a non-array object (Tier 2 runtime sibling of T35, strictly stronger than the bundle-string fingerprint) | L1 (eipc invoke) |
| [T36](../../docs/testing/cases/extensibility.md#t36--hooks-fire) | Bundled `index.js` contains the hooks runtime fingerprint: `hook_started` / `hook_progress` / `hook_response` (single-occurrence Verbose-transcript runtime emits) plus `PreToolUse` / `UserPromptSubmit` registry tokens — pins the runtime hook-fire path the case-doc Verbose-transcript claim hangs on | file probe |
| [T37](../../docs/testing/cases/extensibility.md#t37--claudemd-memory-loads) | Bundled `index.js` contains `[GlobalMemory] Copied CLAUDE.md` log line + `CLAUDE.md` filename literal + `CLAUDE_CONFIG_DIR` env-var token (memory-loading wiring) | file probe |
| [T37b](../../docs/testing/cases/extensibility.md#t37--claudemd-memory-loads) | After `seedFromHost` + `userLoaded`, the `claude.web/CoworkMemory/readGlobalMemory` eipc handler is registered AND callable through the renderer-side wrapper, returning the documented `string \| null` shape (Tier 2 runtime sibling of T37) | L1 (eipc invoke) |
| [T38](../../docs/testing/cases/code-tab-handoff.md#t38--continue-in-ide) | Bundled `index.js` contains `LocalSessions_$_openInEditor` eipc channel name (Tier 1 fingerprint) | file probe |
| [T38b](../../docs/testing/cases/code-tab-handoff.md#t38--continue-in-ide) | After `seedFromHost` + `userLoaded`, the `LocalSessions_$_openInEditor` eipc handler is registered on the claude.ai webContents (Tier 2 runtime sibling of T38) | L1 (eipc registry) |
| H01 | CDP auth gate exits with code 1 when spawned with `--remote-debugging-port` and no `CLAUDE_CDP_AUTH` token | spawn probe |
| H02 | `frame-fix-wrapper.js` + `frame-fix-entry.js` injected into `app.asar` (Proxy + main-field reference) | file probe |
| H03 | Build-pipeline patch fingerprints all present in `app.asar` (KDE gate, frame-fix inject, tray, cowork, claude-code) | file probe |
| H04 | cowork daemon spawns under app and exits with app — soft-skips on rows where it isn't gated to spawn | pgrep delta |
| H05 | UI-drift canary against the AX-tree fingerprint walker (requires `CLAUDE_TEST_USE_HOST_CONFIG=1`) | L1 (AX) |
| [S01](../../docs/testing/cases/distribution.md#s01--appimage-launches-without-manual-libfuse2t64) | AppImage launches without `libfuse.so.2` complaint (skips on non-AppImage rows) | spawn + stderr grep |
| [S02](../../docs/testing/cases/distribution.md#s02--xdg_current_desktopubuntugnome-prefix-form-doesnt-break-de-detection) | No strict `==` equality against `XDG_CURRENT_DESKTOP` in launcher / patches (regression detector) | source-tree probe |
| [S03](../../docs/testing/cases/distribution.md#s03--deb-install-pulls-runtime-deps) | `dpkg-query Depends:` field non-empty (currently fails as upstream-contract regression detector) | dpkg-query |
| [S04](../../docs/testing/cases/distribution.md#s04--rpm-install-pulls-runtime-deps) | `rpm -qR` has at least one non-`rpmlib(...)` requirement (currently fails per #autoreqprov off) | rpm -qR |
| [S05](../../docs/testing/cases/distribution.md#s05--doctor-recognises-dnf-installed-package-doesnt-false-flag-as-appimage) | Doctor does not false-flag rpm-installed package (skips when `rpm -qf` doesn't claim the binary) | spawn + stdout grep |
| [S07](../../docs/testing/cases/shortcuts-and-input.md#s07--claude_use_waylandvar) | Under `CLAUDE_HARNESS_USE_WAYLAND=1`, spawned Electron has `--ozone-platform=wayland` on argv | argv probe |
| [S08](../../docs/testing/cases/tray-and-window-chrome.md#s08--tray-icon-doesnt-duplicate-after-nativetheme-update) | `setImage`-based in-place fast-path injected by `tray.sh` (KDE-only, file probe) | file probe |
| [S09](../../docs/testing/cases/shortcuts-and-input.md#s09--quick-window-patch-runs-only-on-kde-post-406-gate) | KDE-gate string present in bundled `index.js` (patch ran at build) | file probe |
| [S10](../../docs/testing/cases/shortcuts-and-input.md#s10--quick-entry-popup-is-transparent-no-opaque-square-frame) | KDE-W only — popup runtime `getBackgroundColor() === '#00000000'` after Quick Entry opens (regression-detector against electron#50213 if bundled Electron in 41.0.4-bisect-window) | L1 + ydotool |
| [S11](../../docs/testing/cases/shortcuts-and-input.md#s11--quick-entry-shortcut-fires-from-any-focus-on-wayland-mutter-xwayland-key-grab) | GNOME-X / Ubu-X only (X11-side regression detector) — spawn xterm marker, `xdotool windowfocus` to it, verify `_NET_ACTIVE_WINDOW` shifted, fire `Ctrl+Alt+Space` via ydotool, assert popup visible. Wayland-side mutter regression (#404) is a primitive gap — needs Wayland-native focus injection (libei) | L1 + xdotool focus + ydotool shortcut |
| S12 | `--enable-features=GlobalShortcutsPortal` in Electron argv (GNOME-W only — currently a known-failing regression detector) | argv probe |
| [S14](../../docs/testing/cases/shortcuts-and-input.md#s14--global-shortcuts-via-xdg-portal-work-on-niri) | Niri only — spawn `foot` marker, `niri msg action focus-window` to it, verify `niri msg --json focused-window` shifted, fire `Ctrl+Alt+Space` via ydotool, assert popup visible. Currently known-failing detector for the Niri portal `BindShortcuts` path (parallels S12's GNOME-W detector) | L1 + niri msg focus + ydotool shortcut |
| [S15](../../docs/testing/cases/distribution.md#s15--appimage-extraction---appimage-extract-works-as-documented-fallback) | `--appimage-extract` exits 0; `squashfs-root/AppRun --version` runs without FUSE error | spawn + filesystem |
| [S16](../../docs/testing/cases/distribution.md#s16--appimage-mount-cleans-up-on-app-exit) | `mount(8)` shows new `.mount_claude` while app is up; gone within 10s of close | mount delta |
| [S17](../../docs/testing/cases/platform-integration.md#s17--app-launched-from-desktop-inherits-shell-path) | Shell-path-worker overlays user's login-shell PATH onto a deliberately-scrubbed env | L1 + utilityProcess |
| [S19](../../docs/testing/cases/routines.md#s19--claude_config_dir-redirects-scheduled-task-storage) | `extraEnv: { CLAUDE_CONFIG_DIR }` reaches main-process `process.env`; `cE()`-equivalent resolves under the override path | L1 + extraEnv |
| [S21](../../docs/testing/cases/routines.md#s21--lid-close-still-suspends-per-os-policy) | No `handle-lid-switch` / `HandleLidSwitch` strings in bundle (lid policy deferred to OS) | asar absence probe |
| [S22](../../docs/testing/cases/platform-integration.md#s22--computer-use-toggle-absent-or-visibly-disabled-on-linux) | `new Set(["darwin","win32"])` platform gate present; no 2-element Set pairing linux (file-probe form) | asar regex |
| [S25](../../docs/testing/cases/platform-integration.md#s25--mobile-pairing-survives-linux-session-restart) | `safeStorage.encryptString → file → app restart → file → safeStorage.decryptString` round-trips the same plaintext (skips when `isEncryptionAvailable === false`) | L1 + shared isolation handle |
| [S26](../../docs/testing/cases/distribution.md#s26--auto-update-is-disabled-when-installed-via-aptdnf) | `setFeedURL` present + project suppression marker present (currently fails — gated on #567) | asar fingerprint |
| [S27](../../docs/testing/cases/extensibility.md#s27--plugins-install-per-user) | `installed_plugins.json` + homedir resolver present; no `*/plugins` system paths in bundle | asar fingerprint |
| [S28](../../docs/testing/cases/extensibility.md#s28--worktree-creation-surfaces-clear-error-on-read-only-mounts) | Bundled `index.js` contains the worktree permission classifier expression (`"Permission denied" \|\| "Access is denied" \|\| "could not lock config file" → "permission-denied"`) plus the `Failed to create git worktree:` log line | asar fingerprint |
| [S29](../../docs/testing/cases/shortcuts-and-input.md#s29--quick-entry-popup-is-created-lazily-on-first-shortcut-press-closed-to-tray-sanity) | Popup opens when main is hidden-to-tray (lazy-create sanity) | L1 |
| [S30](../../docs/testing/cases/shortcuts-and-input.md#s30--quick-entry-shortcut-becomes-a-no-op-after-full-app-exit) | No new claude-desktop pid spawns after post-exit shortcut press | pgrep delta + ydotool |
| [S31](../../docs/testing/cases/shortcuts-and-input.md#s31--quick-entry-submit-makes-the-new-chat-reachable-from-any-main-window-state) | Submit reaches new chat from visible / minimized / hidden-to-tray (QE-7/8/9) | L1 + ydotool |
| S32 | GNOME mutter stale-`isFocused()` regression (GNOME-W/Ubu-W only — known-failing today) | L1 + ydotool |
| [S33](../../docs/testing/cases/shortcuts-and-input.md#s33--quick-entry-transparent-rendering-tracked-against-bundled-electron-version) | Captures bundled Electron version against the #370 / electron#50213 bisect threshold | file read |
| [S34](../../docs/testing/cases/shortcuts-and-input.md#s34--quick-entry-shortcut-focuses-fullscreen-main-window-instead-of-showing-popup) | Popup does **not** appear when main is fullscreen (upstream contract) | L1 + ydotool |
| [S35](../../docs/testing/cases/shortcuts-and-input.md#s35--quick-entry-popup-position-is-persisted-across-invocations-and-across-app-restarts) | Popup position persists across invocations *and* across app restart (two-launch test) | L1 + shared isolation handle + ydotool |
| S36 | Multi-monitor fallback — skip-on-single-monitor with documented `fixme` for the disconnect orchestration | display probe |
| S37 | Main-window destroy unreachable on Linux per close-to-tray override — documented skip | — |
These specs exercise the substrate primitives in `lib/`: `xprop`
shell-outs (T01, T04), `dbus-next` (T03), `dbus-monitor` subprocess
eavesdrop (T23), Node-inspector runtime-attach
(T07/T16/T17/T26/S10/S29-S35/T05-T14b L1 specs), `app.asar` content reads
(S08/S09/S21/S22/S26/S27/S28/T11/T14a/T18/T22/T30/T31/T32/T33/T35/T36/T37/T38/H02/H03/S33 — mostly `index.js`; T18 reads `mainView.js`),
`/proc/$pid/cmdline` reads (S07/S12), pgrep-based pid deltas
(T10/T14b/H04/S16/S30), `mount(8)` parsing (S16), source-tree probes
against `scripts/launcher-common.sh` (S02), `dpkg-query` / `rpm -qR` /
`rpm -qf` calls (S03/S04/S05/T13), `safeStorage.encryptString`
round-trip across two launches (S25), `extraEnv` precedence over
isolation env (S19), the `lib/electron-mocks.ts` mock-then-call
helpers — `installOpenDialogMock` (T17), `installShowItemInFolderMock`
(T25), `installOpenExternalMock` (T24) — the `lib/input.ts`
focus-shifter (`focusOtherWindow` + `spawnMarkerWindow` for S11; X11
only — `WaylandFocusUnavailable` thrown on native Wayland) and its
Niri-native sibling `lib/input-niri.ts` (`niri msg --json` for the
focus-injection + readback chain, `foot --title` for the marker
window; `NiriIpcUnavailable` thrown off-Niri; consumed by S14), the
`lib/eipc.ts` registry walker (`getEipcChannels` /
`waitForEipcChannel` / `waitForEipcChannels` against
`webContents.ipc._invokeHandlers`; opaque on the UUID, suffix-matched
against case-doc anchors; consumed by T19 / T20 / T22b / T31b / T33b /
T38b) plus its session 8 invoke surface (`invokeEipcChannel` — calls
a registered handler through the renderer-side wrapper at
`window['claude.<scope>'].<Iface>.<method>`; consumed by T19 / T20 /
T27 / T33c / T35b / T37b), the `lib/ax.ts` AX-tree substrate
(`snapshotAx` for one-shot reads + `waitForAxNode` / `waitForAxNodes`
for predicate-based polling, plus re-exports of `RawElement` /
`AxNode` / `axTreeToSnapshot` / `waitForAxTreeStable` from
`explore/walker.ts` so consumers stay inside `lib/`; threshold-
driven extraction in session 13 once T26 had to duplicate the
formerly-private `snapshotAx` from `claudeai.ts`; consumed by
`claudeai.ts` page-objects + T26; session 14 migrated `activateTab`
from a one-shot snapshot to `waitForAxNode` polling — fixes the
T16 `no AX-tree button with accessibleName="Code" found` failure
mode where the Code button hadn't rendered yet at click time —
and converted `CodeTab.activate`'s post-click `findCompactPills`
retry loop to `waitForAxNodes`) — and the
`createIsolation({ seedFromHost: true })` primitive that lets login-
required tests run hermetically against a copy of the host's signed-
in auth state (T07, T11_runtime, T16, T17, T19, T20, T21, T22b, T26,
T27, T31b, T33b, T33c, T35b, T37b, T38b — session 15 migrated T17
from the legacy `CLAUDE_TEST_USE_HOST_CONFIG=1` / `isolation: null`
shape to `seedFromHost`, fixing a pre-existing 60s spec-timeout
flake where the unauth'd default isolation polled `userLoaded` past
Playwright's spec budget; session 16 verified the migration end-to-
end — `seedFromHost` clones the host's signed-in config,
`waitForReady('userLoaded')` resolves to a post-login URL, and the
session-14 `CodeTab.activate({ timeout: 15_000 })` succeeds; T17
now reaches a NEW failure mode at the next chain step
(`openFolderPicker` after `selectLocal`, `Select folder…` pill
doesn't render on `/epitaxy` workspace route — likely needs `/new`
context, deferred for a future session).
Note on eipc channels: the `LocalSessions_$_*` and `CustomPlugins_$_*`
channel names referenced in the case-doc Code anchors don't register
through Electron's *global* `ipcMain.handle()` registry (which only
carries 3 chat-tab MCP-bridge handlers). They DO register through
Electron's stdlib `IpcMainImpl` — just on the per-`webContents` IPC
scope (`webContents.ipc._invokeHandlers`, Electron 17+) rather than
the global one. The framing is
`$eipc_message$_<UUID>_$_<scope>_$_<iface>_$_<method>` (UUID stable
across builds at `c0eed8c9-…`); 117 `LocalSessions_*` + 16
`CustomPlugins_*` + 50+ other interfaces register on the claude.ai
webContents. T22 / T31 / T33 / T38 ship as Tier 1 fingerprints
against the bundled channel-name strings; T22b / T31b / T33b / T38b
are the runtime registry-presence siblings (strictly stronger,
require `seedFromHost`). T27 / T33c / T35b / T37b go one step
further — they invoke the resolved handlers through the renderer-
side wrapper at `window['claude.<scope>'].<Iface>.<method>`. T19 /
T20 are first-runtime-probe siblings of case-doc tests whose anchors
are write-side handlers (`startShellPty` / `writeSessionFile`); they
ship a five-suffix / three-suffix registration probe over the
case-doc-anchored write-side surface plus a single foundational
read-side `LocalSessions/getAll` invocation as the read-side
surrogate (case-doc connection: integrated terminal and file pane
both bind to LocalSessions; `getAll` proves the LocalSessions impl
object is reachable through the renderer wrapper). T21 and
T11_runtime extend the dual-invocation pattern: when a case-doc has
read-side anchors with resolvable arg shapes, invoke the case-doc-
anchored handlers directly rather than through a foundational
surrogate (T21: `getConfiguredServices` array + `getAutoVerify`
boolean on a single Launch impl object; T11_runtime: cross-impl-
object dual invocation — `CustomPlugins/listInstalledPlugins` array
+ `LocalPlugins/getPlugins` array — proves the install plumbing
crosses both interfaces intact, strictly stronger than single-
interface coverage). All wrapper
invocations use the wrapper exposed by `mainView.js` via
`contextBridge.exposeInMainWorld` after a top-frame + origin gate
(`Qc()`: claude.ai / claude.com / preview.* / localhost). Calling
through the wrapper carries an honest `senderFrame` for the inlined
`le()` / `Vi()` per-handler origin gate, so the test surface matches
real attack surface. T33c also
demonstrates the schema-rev path: when invocation rejects with
`Argument "<name>" at position N ... failed to pass validation`,
the verbatim rejection string is the cheapest grep target back to
the inline hand-rolled validator block (bundle bytes 5013601 /
5018821 for the two CustomPlugins methods). See `lib/eipc.ts` for
both surfaces, and
[`runner-implementation-plan.md`](../../docs/testing/runner-implementation-plan.md)
session 7 / 8 / 9 / 10 status sections for the findings.
Per-row pass/skip counts depend on which sweep runs against the row;
see `runner-implementation-plan.md` for tier classification and
matrix-regen for the most-recent per-row outcomes. The Quick Entry
runners (S29-S35) all share the same primitive set (`installInterceptor()`
+ `openAndWaitReady()` + scenario-specific state setup).
## Prerequisites
On the host or VM running the sweep:
- Node.js ≥ 20
- `claude-desktop` installed (deb / rpm / AppImage), reachable via `claude-desktop` on `PATH` or `CLAUDE_DESKTOP_LAUNCHER` env var
- `xprop` (for L2 window queries — `dnf install xorg-x11-utils` on Fedora; `apt install x11-utils` on Debian/Ubuntu)
- `zstd` (optional — used to bundle results)
### Quick Entry runners (S29S37, future QE-*)
Quick Entry tests inject the OS-level shortcut via `ydotool` /
`/dev/uinput`. One-time setup per host or VM:
```sh
# Install the binary + daemon
sudo dnf install -y ydotool # or: sudo apt install ydotool
# Make ydotoold's socket world-writable so the test runner reaches it
sudo mkdir -p /etc/systemd/system/ydotool.service.d
sudo tee /etc/systemd/system/ydotool.service.d/override.conf <<'EOF'
[Service]
ExecStart=
ExecStart=/usr/bin/ydotoold --socket-perm=0666
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now ydotool.service
```
After this, `ydotool key 29:1 29:0` (Ctrl tap) should exit 0. The
runner sets `YDOTOOL_SOCKET=/tmp/.ydotool_socket` automatically;
override the env var if your daemon binds elsewhere.
ydotool **cannot** drive portal-grabbed shortcuts (kernel uinput
events vs compositor portal grabs) — those tests stay manual until
libei adoption broadens. See [`docs/testing/automation.md`](../../docs/testing/automation.md#input-injection--ydotool-now-libei-next).
## Install
```sh
cd tools/test-harness
npm install
```
`package-lock.json` is gitignored for now; commit it once the dep set is settled.
## Run
```sh
# All four tests against the locally installed claude-desktop
ROW=KDE-W ./orchestrator/sweep.sh
# Single test
npx playwright test src/runners/T01_app_launch.spec.ts
# Headed (watch the app launch in front of you)
npx playwright test --headed
# Run the full suite under native Wayland instead of X11/XWayland
CLAUDE_HARNESS_USE_WAYLAND=1 npm test
# Grounding probe — dump runtime state for the case-doc grounding sweep
npm run grounding-probe -- --launch --include-synthetic \
--out ../../docs/testing/cases-grounding-runtime.json
```
Results land at `results/results-${ROW}-${DATE}/`:
```
results/results-KDE-W-20260430T143000Z/
├── junit.xml # JUnit summary (matrix-regen input)
├── html/ # Playwright HTML report
└── test-output/ # Per-test attachments (screenshots, logs, etc.)
```
A bundled `results-${ROW}-${DATE}.tar.zst` sits next to the dir if `zstd`
is installed.
## Environment variables
| Var | Default | Purpose |
|-----|---------|---------|
| `ROW` | `KDE-W` | Matrix row label, propagated into the bundle name and per-test annotations. Drives `skipUnlessRow()` in spec files |
| `CLAUDE_DESKTOP_LAUNCHER` | `claude-desktop` (PATH lookup) | Path to the launcher / Electron binary Playwright spawns |
| `CLAUDE_DESKTOP_ELECTRON` | probed | Override the resolved Electron binary path (skips deb/rpm install probing) |
| `CLAUDE_DESKTOP_APP_ASAR` | probed | Override the resolved `app.asar` path |
| `CLAUDE_TEST_USE_HOST_CONFIG` | unset | When `1`, opt out of per-test isolation and use the host's real `~/.config/Claude`. Required for tests that need a signed-in claude.ai (S31, future submit-side QE runners). **Side effect:** these tests write to your real account — chats / settings persist |
| `CLAUDE_HARNESS_USE_WAYLAND` | unset | When `1`, every runner spawns Electron with the native-Wayland backend (`--ozone-platform=wayland` + sibling flags from `launcher-common.sh`) instead of the default X11-via-XWayland. `CLAUDE_USE_WAYLAND=1` is also exported into the spawn env for in-app paths that read it. Per-launch overrides via `launchClaude({ extraEnv })` still win |
| `YDOTOOL_SOCKET` | `/tmp/.ydotool_socket` | Path to the `ydotoold` socket. Override only if the daemon binds elsewhere |
| `OUTPUT_DIR` | `./results` | Where bundles land |
| `RESULTS_DIR` | per-run derived | Single-run output dir (set by `sweep.sh`; usually you don't set this manually) |
### Per-test isolation default
`launchClaude()` creates a fresh `XDG_CONFIG_HOME` / `CLAUDE_CONFIG_DIR`
under `$TMPDIR/claude-test-*` for every launch and removes it on
`close()`. This is the default to prevent state leaks between tests
(SingletonLock collisions, persisted Quick Entry positions, etc. —
see Decision 1 in [`docs/testing/automation.md`](../../docs/testing/automation.md)).
Three escape hatches:
- **`launchClaude()`** — default, fresh per-launch isolation.
- **`launchClaude({ isolation })`** — pass a shared `Isolation` handle
to launch the same app twice with persistent state (e.g. S35
position-memory across restart).
- **`launchClaude({ isolation: null })`** — opt out entirely; share
the host's `~/.config/Claude`. Used by tests gated on
`CLAUDE_TEST_USE_HOST_CONFIG` for signed-in claude.ai access.
## Layout
```
tools/test-harness/
├── package.json
├── tsconfig.json
├── playwright.config.ts
├── src/
│ ├── lib/ # shared helpers
│ │ ├── electron.ts # spawn + isolation + inspector attach
│ │ ├── inspector.ts # Node-inspector RPC client (SIGUSR1 path)
│ │ ├── dbus.ts # dbus-next session-bus + helpers
│ │ ├── sni.ts # StatusNotifierWatcher / Item
│ │ ├── wm.ts # xprop wrappers (X11 + XWayland)
│ │ ├── env.ts # XDG_CURRENT_DESKTOP / SESSION_TYPE branching
│ │ ├── row.ts # skipUnlessRow / skipOnRow primitives
│ │ ├── isolation.ts # per-test XDG_CONFIG_HOME sandbox
│ │ ├── argv.ts # /proc/$pid/cmdline reader + flag check
│ │ ├── asar.ts # in-place app.asar reads (no temp extract)
│ │ ├── quickentry.ts # Quick Entry domain wrapper (popup, MainWindow, ydotool)
│ │ ├── claudeai.ts # claude.ai renderer UI domain (CodeTab, dialog mock, atoms)
│ │ ├── electron-mocks.ts # mock-then-call helpers (dialog/showItemInFolder/openExternal)
│ │ ├── input.ts # focus-shifter primitive (X11 only — xdotool + xprop verify; spawnMarkerWindow xterm)
│ │ ├── input-niri.ts # focus-shifter primitive (Niri only — niri msg --json verify; spawnMarkerWindow foot)
│ │ ├── eipc.ts # eipc-channel registry walker (per-webContents IPC scope; suffix-matched, UUID-opaque)
│ │ ├── retry.ts # poll-until-true with timeout
│ │ └── diagnostics.ts # launcher log, --doctor, session env
│ └── runners/ # one .spec.ts per test ID
│ ├── T01_app_launch.spec.ts
│ ├── T03_tray_icon_present.spec.ts
│ ├── T04_window_decorations.spec.ts
│ ├── T17_folder_picker.spec.ts
│ ├── S09_quick_window_patch_only_kde.spec.ts
│ ├── S12_global_shortcuts_portal_flag.spec.ts
│ ├── S29_quick_entry_lazy_create_closed_to_tray.spec.ts
│ ├── S30_quick_entry_noop_after_app_exit.spec.ts
│ ├── S31_quick_entry_submit_reaches_new_chat.spec.ts
│ ├── S32_quick_entry_submit_gnome_stale_isfocused.spec.ts
│ ├── S33_electron_version_capture.spec.ts
│ ├── S34_shortcut_focuses_fullscreen_main.spec.ts
│ ├── S35_quick_entry_position_persisted_across_restarts.spec.ts
│ ├── S36_quick_entry_fallback_to_primary_display.spec.ts
│ ├── S37_quick_entry_popup_after_main_destroy.spec.ts
│ ├── H01_cdp_gate_canary.spec.ts
│ ├── H02_frame_fix_wrapper_present.spec.ts
│ ├── H03_patch_fingerprints.spec.ts
│ └── H04_cowork_daemon_lifecycle.spec.ts
├── probe.ts # one-off renderer-DOM probe (debugger on :9229)
├── grounding-probe.ts # case-grounding runtime capture (see "Grounding probe" below)
└── orchestrator/
└── sweep.sh # row-aware harness invocation
```
H-prefix specs are harness self-tests — they validate the harness's
preconditions and the build pipeline's invariants (CDP gate alive,
patches landed, daemon lifecycle clean). Cheap, run in <1s each
except H04 which launches the app.
## How L1 testing works (the SIGUSR1 path)
The shipped Electron has a CDP auth gate that exits the app whenever
`--remote-debugging-port` or `--remote-debugging-pipe` is on argv and a
valid `CLAUDE_CDP_AUTH` token isn't in env. Both Playwright's
`_electron.launch()` and `chromium.connectOverCDP()` inject the gated
flag, so both are blocked.
The gate doesn't check `--inspect` or runtime `SIGUSR1`, which is the
same code path as the in-app `Developer → Enable Main Process Debugger`
menu item. So:
1. `launchClaude()` spawns Electron with no debug-port flags (gate
asleep) and waits for the X11 window.
2. `app.attachInspector()` sends `SIGUSR1` to the pid; Node's inspector
opens on port 9229.
3. `lib/inspector.ts` connects via WebSocket and exposes
`evalInMain(body)` and `evalInRenderer(urlFilter, js)` for tests.
From the inspector you can:
- Drive the renderer via `webContents.executeJavaScript()`
- Install main-process mocks (e.g. `dialog.showOpenDialog` for T17)
- Inspect any Electron API state
Two gotchas worth knowing:
- `BrowserWindow.getAllWindows()` returns 0 because frame-fix-wrapper
substitutes the BrowserWindow class. Use `webContents.getAllWebContents()`
instead — works correctly and includes both the shell window and the
embedded claude.ai BrowserView.
- `Runtime.evaluate` with `awaitPromise: true` returns empty objects for
awaited Promise resolutions. `inspector.evalInMain<T>()` returns
`JSON.stringify(value)` from the IIFE and parses on the caller side
to dodge this.
Full writeup with rationale and tradeoffs:
[`docs/testing/automation.md` "The CDP auth gate"](../../docs/testing/automation.md#the-cdp-auth-gate-and-the-runtime-attach-workaround-that-beats-it).
## Grounding probe
`grounding-probe.ts` is a separate entry-point — not a Playwright spec —
that connects to a live Claude Desktop and dumps the runtime state
backing the load-bearing claims in
[`docs/testing/cases/`](../../docs/testing/cases/). It exists because
static grep against the 546k-line beautified bundle has known blind
spots (lazy `import()`s, dynamic handler tables, conditional wiring),
and some claims (S26 autoUpdater gate, S20 powerSaveBlocker path) can
only be verified at runtime.
```sh
# Self-contained: launchClaude() + capture + tear down
npm run grounding-probe -- --launch
# Plus the one synthetic probe (powerSaveBlocker start+stop)
npm run grounding-probe -- --launch --include-synthetic
# Attach to an already-running app (manual --inspect=9229 setup)
npm run grounding-probe -- --port 9229 --out /tmp/probe.json
```
Output is keyed by test ID — see the file's header comment for the
full table. Diff captures across upstream version bumps to spot
behavior drift the static sweep would miss. Surfaces inside modals
or popups (T22 PR toolbar, T26 preset list, T31 side chat, T32 slash
menu) need the surface open at probe time — the AX-tree fingerprint
is a snapshot of what's currently on screen.
## Known limitations
- **T04** uses `xprop` (no `xdotool` dependency — walks `_NET_CLIENT_LIST` + `_NET_WM_PID`). Works on X11 native and KDE Wayland (XWayland), **not** on native-Wayland sessions where the app is running through Ozone-Wayland directly. Per Decision 6, project default is X11; native-Wayland window-state queries are deferred until those tests get added.
- **T17** is shallow — it intercepts `dialog.showOpenDialog` at the Electron main process level. The integration question "does Claude make the right *portal* call?" is a v2 concern; portal-level mocking via `dbus-next` is sketched in [`docs/testing/automation.md`](../../docs/testing/automation.md) but requires displacing the running portal service or running under `dbus-run-session`.
- **`render-matrix.sh`** isn't here yet. `sweep.sh` prints a summary; the `matrix.md` regen step from JUnit is the next addition.
- **No CI wrapper.** Decision 4: the harness is invokable from CI but sweeps run from the dev box for the first ~20 tests.
## Adding a test
1. Pick the `T##` / `S##` from [`docs/testing/cases/`](../../docs/testing/cases/).
2. Drop `src/runners/T##_short_name.spec.ts`. Use the existing five as templates — match the layer (L1 / L2) to the test's assertion shape.
3. First line of the test body: `skipUnlessRow(testInfo, ['KDE-W', ...])`. JUnit `<skipped>` → matrix `-`, never `✗` for a row that doesn't apply.
4. Tag the test with `severity` and `surface` annotations so the JUnit output carries them.
5. Capture diagnostics via `testInfo.attach()` — these become Decision 7 "always-on" captures regardless of pass/fail. For tests that need richer state on failure, wrap your scenarios in a results-collector and attach a single JSON dump (S31's pattern).
6. No fixed `sleep`s. Use `retryUntil` or Playwright's auto-wait.
### Hooking Electron — read this before reaching for `BrowserWindow`
`scripts/frame-fix-wrapper.js` returns the `electron` module wrapped
in a `Proxy` whose `get` trap returns a closure-captured
`PatchedBrowserWindow`. **Constructor-level wraps don't work** — your
`electron.BrowserWindow = WrappedCtor` write lands on the underlying
module but the Proxy keeps returning `PatchedBrowserWindow` on
read, so the wrap is bypassed. The reliable hook is at the
**prototype-method level**:
```ts
// in inspector.evalInMain(...)
const proto = electron.BrowserWindow.prototype;
const orig = proto.loadFile;
proto.loadFile = function(filePath, ...rest) {
// record `this` + filePath; identify popups by filePath suffix
return orig.call(this, filePath, ...rest);
};
```
This captures every instance regardless of subclass identity.
Construction-time options (`transparent: true`, `frame: false`,
etc.) aren't observable through this hook — use runtime
equivalents instead (`getBackgroundColor()`, `getContentBounds()
vs getBounds()`, `isAlwaysOnTop()`). `lib/quickentry.ts` is the
worked example.

View File

@@ -0,0 +1,309 @@
// Probe to verify whether the eipc channel registry (LocalSessions_$_*,
// CustomPlugins_$_*) is reachable from main via webContents.ipc._invokeHandlers
// instead of the empty-on-this-build globalThis.ipcMain._invokeHandlers.
//
// Run from tools/test-harness against a running claude-desktop with the
// main-process debugger enabled (Developer → Enable Main Process Debugger
// in the app menu, or `claude-desktop` was launched with --inspect):
// npx tsx eipc-registry-probe.ts
//
// Useful states to probe (re-run to compare):
// * fresh launch — whichever tab opens by default
// * /epitaxy with a Code session open
// * /chats with a chat thread open
// * cowork tab loaded
// The per-interface breakdown surfaces which interfaces register lazily
// vs eagerly — useful for designing the lib/eipc.ts primitive's wait
// semantics.
//
// Non-destructive — read-only enumeration of handler keys. Doesn't invoke
// anything, doesn't register anything, doesn't mutate state.
import { InspectorClient } from './src/lib/inspector.js';
import { writeFileSync } from 'node:fs';
interface InterfaceCount {
scope: string;
iface: string;
count: number;
sampleMethods: string[];
}
interface PerWcReport {
id: number;
url: string;
type: string;
hasIpc: boolean;
hasInvokeHandlers: boolean;
totalHandlers: number;
framedCount: number;
unframedCount: number;
scopes: string[];
byInterface: InterfaceCount[];
unframedSample: string[];
}
async function main() {
const client = await InspectorClient.connect(9229);
// Confirm globalThis.ipcMain._invokeHandlers is empty (or near-empty)
// — that's session 3's finding and we want it on the record alongside
// the per-wc reading for contrast.
const ipcMainReport = await client.evalInMain<{
hasIpcMain: boolean;
ipcMainKeys: string[];
ipcMainCount: number;
}>(`
const electron = process.mainModule.require('electron');
const ipcMain = electron.ipcMain;
const map = ipcMain && ipcMain._invokeHandlers;
if (!map) {
return { hasIpcMain: !!ipcMain, ipcMainKeys: [], ipcMainCount: 0 };
}
const keys = (typeof map.keys === 'function')
? Array.from(map.keys())
: Object.keys(map);
return {
hasIpcMain: true,
ipcMainKeys: keys,
ipcMainCount: keys.length,
};
`);
// Per-webContents enumeration with full framing parse:
// $eipc_message$_<UUID>_$_<scope>_$_<interface>_$_<method>
// Scope examples: claude.settings, claude.web, claude.app_internal.
// Interface examples: GlobalShortcut, LocalSessions, CustomPlugins.
// We group by scope.iface to show which feature areas are populated
// on each webContents — what registers eagerly vs on-tab-load.
const perWcReports = await client.evalInMain<PerWcReport[]>(`
const { webContents } = process.mainModule.require('electron');
const re = /^\\$eipc_message\\$_[0-9a-f-]+_\\$_([^_]+(?:\\.[^_]+)*)_\\$_([^_]+)_\\$_(.+)$/;
const all = webContents.getAllWebContents();
const out = [];
for (const w of all) {
const ipc = w.ipc;
const invokeMap = ipc && ipc._invokeHandlers;
let keys = [];
let hasInvokeHandlers = false;
if (invokeMap) {
hasInvokeHandlers = true;
if (typeof invokeMap.keys === 'function') {
keys = Array.from(invokeMap.keys());
} else {
keys = Object.keys(invokeMap);
}
}
const groups = new Map();
const scopes = new Set();
let framedCount = 0;
let unframedCount = 0;
const unframedSample = [];
for (const k of keys) {
const m = re.exec(k);
if (!m) {
unframedCount++;
if (unframedSample.length < 8) unframedSample.push(k);
continue;
}
framedCount++;
const scope = m[1];
const iface = m[2];
const method = m[3];
scopes.add(scope);
const groupKey = scope + '/' + iface;
let g = groups.get(groupKey);
if (!g) {
g = { scope, iface, count: 0, sampleMethods: [] };
groups.set(groupKey, g);
}
g.count++;
if (g.sampleMethods.length < 4) g.sampleMethods.push(method);
}
const byInterface = Array.from(groups.values())
.sort((a, b) => b.count - a.count);
out.push({
id: w.id,
url: w.getURL(),
type: w.getType ? w.getType() : 'unknown',
hasIpc: !!ipc,
hasInvokeHandlers,
totalHandlers: keys.length,
framedCount,
unframedCount,
scopes: Array.from(scopes).sort(),
byInterface,
unframedSample,
});
}
return out;
`);
// For each case-doc anchored channel, find which webContents (if any)
// hosts it. The framing prefix `$eipc_message$_<UUID>_$_claude.web_$_`
// is build-stable per session 2's T38 finding, so we match by suffix.
const expected = [
// T22 — gh PR check monitoring
'LocalSessions_$_getPrChecks',
// T31 — side chat trio
'LocalSessions_$_startSideChat',
'LocalSessions_$_sendSideChatMessage',
'LocalSessions_$_stopSideChat',
// T33 — plugin browser
'CustomPlugins_$_listMarketplaces',
'CustomPlugins_$_listAvailablePlugins',
// T38 — Continue in IDE
'LocalSessions_$_openInEditor',
];
const expectedReport = await client.evalInMain<
Array<{ suffix: string; foundOn: number[]; matchedKeys: string[] }>
>(`
const { webContents } = process.mainModule.require('electron');
const expected = ${JSON.stringify(expected)};
const all = webContents.getAllWebContents();
const out = [];
for (const suffix of expected) {
const foundOn = [];
const matchedKeys = [];
for (const w of all) {
const ipc = w.ipc;
const invokeMap = ipc && ipc._invokeHandlers;
if (!invokeMap) continue;
const keys = (typeof invokeMap.keys === 'function')
? Array.from(invokeMap.keys())
: Object.keys(invokeMap);
for (const k of keys) {
if (k.endsWith(suffix)) {
if (!foundOn.includes(w.id)) foundOn.push(w.id);
if (!matchedKeys.includes(k)) matchedKeys.push(k);
}
}
}
out.push({ suffix, foundOn, matchedKeys });
}
return out;
`);
// Snapshot the framing UUID(s) — useful to confirm build-stability
// across the per-wc registries (session 2 noted it as build-stable
// `c0eed8c9-...`).
const framingReport = await client.evalInMain<{
uuidsSeen: string[];
samplesPerUuid: Record<string, string[]>;
}>(`
const { webContents } = process.mainModule.require('electron');
const re = /^\\$eipc_message\\$_([0-9a-f-]+)_\\$_/;
const uuidsSeen = new Set();
const samples = {};
for (const w of webContents.getAllWebContents()) {
const ipc = w.ipc;
const invokeMap = ipc && ipc._invokeHandlers;
if (!invokeMap) continue;
const keys = (typeof invokeMap.keys === 'function')
? Array.from(invokeMap.keys())
: Object.keys(invokeMap);
for (const k of keys) {
const m = re.exec(k);
if (!m) continue;
const uuid = m[1];
uuidsSeen.add(uuid);
if (!samples[uuid]) samples[uuid] = [];
if (samples[uuid].length < 3) samples[uuid].push(k);
}
}
return {
uuidsSeen: Array.from(uuidsSeen),
samplesPerUuid: samples,
};
`);
console.log('=== globalThis.ipcMain._invokeHandlers (session 3 baseline) ===');
console.log(JSON.stringify(ipcMainReport, null, 2));
console.log('\n=== Per-webContents IPC registries ===');
console.log(JSON.stringify(perWcReports, null, 2));
console.log('\n=== Expected case-doc-anchored channel resolution ===');
console.log(JSON.stringify(expectedReport, null, 2));
console.log('\n=== Framing UUID(s) observed ===');
console.log(JSON.stringify(framingReport, null, 2));
// Cross-webContents per-interface deltas — useful when comparing
// "fresh launch" vs "after navigating to /epitaxy" vs "after opening
// cowork tab". Lists every (scope, iface) seen anywhere with the
// per-wc breakdown of which has it.
const interfaceAcrossWcs = (() => {
const matrix = new Map<string, Map<number, number>>();
for (const wc of perWcReports) {
for (const g of wc.byInterface) {
const key = `${g.scope}/${g.iface}`;
let row = matrix.get(key);
if (!row) {
row = new Map();
matrix.set(key, row);
}
row.set(wc.id, g.count);
}
}
const out: Array<{
interfaceKey: string;
perWc: Record<string, number>;
total: number;
}> = [];
for (const [key, row] of matrix) {
const perWc: Record<string, number> = {};
let total = 0;
for (const [wcId, count] of row) {
perWc[`wc${wcId}`] = count;
total += count;
}
out.push({ interfaceKey: key, perWc, total });
}
out.sort((a, b) => b.total - a.total);
return out;
})();
console.log('\n=== Interface presence across webContents ===');
console.log(JSON.stringify(interfaceAcrossWcs, null, 2));
const totalAll = perWcReports.reduce((a, r) => a + r.totalHandlers, 0);
const totalFramed = perWcReports.reduce((a, r) => a + r.framedCount, 0);
const totalUnframed = perWcReports.reduce((a, r) => a + r.unframedCount, 0);
const expectedFound = expectedReport.filter((e) => e.foundOn.length > 0).length;
const totalDistinctInterfaces = new Set(
perWcReports.flatMap((r) => r.byInterface.map((g) => `${g.scope}/${g.iface}`)),
).size;
console.log('\n=== Summary ===');
console.log(JSON.stringify({
webContentsCount: perWcReports.length,
webContentsUrls: perWcReports.map((r) => `wc${r.id}: ${r.url}`),
ipcMainHandlerCount: ipcMainReport.ipcMainCount,
perWcTotalHandlerCount: totalAll,
perWcFramedCount: totalFramed,
perWcUnframedCount: totalUnframed,
distinctInterfacesAcrossAllWcs: totalDistinctInterfaces,
expectedSuffixesFound: `${expectedFound} / ${expected.length}`,
framingUuidsObserved: framingReport.uuidsSeen.length,
}, null, 2));
const out = {
ipcMainReport,
perWcReports,
expectedReport,
framingReport,
interfaceAcrossWcs,
};
writeFileSync('/tmp/eipc-registry-probe.json', JSON.stringify(out, null, 2));
console.log('\nFull dump → /tmp/eipc-registry-probe.json');
client.close();
process.exit(0);
}
main().catch((err) => {
console.error('probe failed:', err);
process.exit(1);
});

View File

@@ -0,0 +1,280 @@
// Derives the stable-UI vocabulary corpus from an existing inventory.
// Output is committed at docs/testing/ui-vocabulary.json and consumed
// by the v7 walker (Phase 2) when classifying captured accessible-
// names. Re-run on each major upstream release.
//
// Rules (adapted from the v7 plan to the v6-collapsed inventory shape):
// - Persistent entries collapse to one inventory entry with a
// `surfaces[]` array recording every surface the element was
// observed on. Any persistent label whose surfaces[] has length
// >= 2 is stable by definition.
// - Structural / menu entries: stable if the label is shared by 3+
// entries OR appears on 2+ distinct surfaces. Either signal is
// enough — the plan's strict 3-and-2 conjunction over-rejects
// against a v6-collapsed inventory where most chrome already
// deduped to one entry.
// - Names matching any INSTANCE_SHAPES regex go to instanceShapes
// and are excluded from stable / suspect even if they would have
// qualified — the instance-shape pattern is the canonical
// representation for those at resolve time.
// - kind: instance entries are excluded from the stable corpus
// entirely — those labels by definition vary per session. (A
// label that appears in BOTH instance and structural entries
// follows the structural / menu rule.)
// - Everything else falls through to `suspect`, queued for human
// reconciliation.
import {
existsSync,
readFileSync,
renameSync,
writeFileSync,
} from 'node:fs';
import { dirname, resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
import { INSTANCE_SHAPES } from '../src/lib/name-classifier.js';
import type { Inventory, InventoryEntry } from './walker.js';
const HERE = dirname(fileURLToPath(import.meta.url));
const TESTING_DIR = resolve(HERE, '..', '..', '..', 'docs', 'testing');
const DEFAULT_INVENTORY = resolve(TESTING_DIR, 'ui-inventory.json');
const DEFAULT_OUTPUT = resolve(TESTING_DIR, 'ui-vocabulary.json');
interface CliOpts {
inventory: string;
output: string;
help: boolean;
}
interface InstanceShapeOutput {
id: string;
regex: string;
flags: string;
pattern: string | null;
matchedNames: string[];
}
interface VocabularyOutput {
derivedAt: string;
sourceInventory: {
capturedAt: string;
appVersion: string;
walkerVersion: string;
totalElements: number;
};
stable: string[];
instanceShapes: InstanceShapeOutput[];
suspect: string[];
}
function parseCli(argv: string[]): CliOpts {
const opts: CliOpts = {
inventory: DEFAULT_INVENTORY,
output: DEFAULT_OUTPUT,
help: false,
};
for (let i = 0; i < argv.length; i += 1) {
const a = argv[i]!;
switch (a) {
case '-h':
case '--help':
opts.help = true;
break;
case '--inventory': {
const v = argv[++i];
if (!v) {
process.stderr.write('--inventory requires a path\n');
process.exit(1);
}
opts.inventory = resolve(v);
break;
}
case '--output': {
const v = argv[++i];
if (!v) {
process.stderr.write('--output requires a path\n');
process.exit(1);
}
opts.output = resolve(v);
break;
}
default:
process.stderr.write(
`derive-vocabulary: unknown argument: ${a}\n`,
);
printUsage();
process.exit(1);
}
}
return opts;
}
function printUsage(): void {
process.stdout.write(
'Usage: tsx explore/derive-vocabulary.ts [options]\n' +
'\n' +
'Derives docs/testing/ui-vocabulary.json from an existing\n' +
'inventory walk. Output records the stable-UI corpus, the\n' +
'instance-shape registry hits, and any names flagged for\n' +
'human triage.\n' +
'\n' +
'Options:\n' +
' --inventory <path> Override default inventory path\n' +
' (default: docs/testing/ui-inventory.json)\n' +
' --output <path> Override default vocabulary output path\n' +
' (default: docs/testing/ui-vocabulary.json)\n' +
' -h, --help Print this help and exit\n',
);
}
function loadInventory(path: string): Inventory {
if (!existsSync(path)) {
process.stderr.write(
`derive-vocabulary: inventory not found: ${path}\n`,
);
process.exit(1);
}
try {
return JSON.parse(readFileSync(path, 'utf8')) as Inventory;
} catch (err) {
const msg = err instanceof Error ? err.message : String(err);
process.stderr.write(
`derive-vocabulary: failed to parse inventory: ${msg}\n`,
);
process.exit(1);
}
}
interface LabelStats {
kinds: Set<InventoryEntry['kind']>;
surfaces: Set<string>;
entryCount: number;
maxPersistentSpan: number;
}
function aggregate(inv: Inventory): Map<string, LabelStats> {
const stats = new Map<string, LabelStats>();
for (const e of inv.entries) {
const lbl = e.label;
if (!lbl) continue;
let s = stats.get(lbl);
if (!s) {
s = {
kinds: new Set(),
surfaces: new Set(),
entryCount: 0,
maxPersistentSpan: 0,
};
stats.set(lbl, s);
}
s.kinds.add(e.kind);
s.surfaces.add(e.surface);
s.entryCount += 1;
if (e.kind === 'persistent' && e.surfaces) {
s.maxPersistentSpan = Math.max(
s.maxPersistentSpan,
e.surfaces.length,
);
}
}
return stats;
}
function classify(inv: Inventory): VocabularyOutput {
const stats = aggregate(inv);
const stable = new Set<string>();
const suspect = new Set<string>();
const instanceHits = new Map<string, Set<string>>();
for (const shape of INSTANCE_SHAPES) {
instanceHits.set(shape.id, new Set());
}
for (const [lbl, s] of stats) {
// Pure-instance label — exclude entirely.
if (s.kinds.size === 1 && s.kinds.has('instance')) {
continue;
}
// Instance-shape regex match — record + skip stable/suspect.
let shapeMatched = false;
for (const shape of INSTANCE_SHAPES) {
if (shape.regex.test(lbl)) {
instanceHits.get(shape.id)!.add(lbl);
shapeMatched = true;
break;
}
}
if (shapeMatched) continue;
// Persistent: surfaces[] >= 2 carries the proof that the chrome
// element actually spans surfaces.
if (s.maxPersistentSpan >= 2) {
stable.add(lbl);
continue;
}
// Structural / menu: 3+ entries OR 2+ distinct surfaces.
if (s.entryCount >= 3 || s.surfaces.size >= 2) {
stable.add(lbl);
continue;
}
suspect.add(lbl);
}
const instanceShapesOut: InstanceShapeOutput[] = INSTANCE_SHAPES.map(
(shape) => ({
id: shape.id,
regex: shape.regex.source,
flags: shape.regex.flags,
pattern: shape.pattern,
matchedNames: [...instanceHits.get(shape.id)!].sort(),
}),
);
return {
derivedAt: new Date().toISOString(),
sourceInventory: {
capturedAt: inv.capturedAt,
appVersion: inv.appVersion,
walkerVersion: inv.walkerVersion,
totalElements: inv.totalElements,
},
stable: [...stable].sort(),
instanceShapes: instanceShapesOut,
suspect: [...suspect].sort(),
};
}
function atomicWrite(path: string, body: string): void {
const tmp = `${path}.tmp`;
writeFileSync(tmp, body, 'utf8');
renameSync(tmp, path);
}
function main(): void {
const opts = parseCli(process.argv.slice(2));
if (opts.help) {
printUsage();
return;
}
const inv = loadInventory(opts.inventory);
const out = classify(inv);
const body = `${JSON.stringify(out, null, 2)}\n`;
atomicWrite(opts.output, body);
const shapeHitTotal = out.instanceShapes.reduce(
(n, s) => n + s.matchedNames.length,
0,
);
process.stdout.write(
`derive-vocabulary: wrote ${opts.output}\n` +
` source: ${opts.inventory} (${inv.totalElements} entries)\n` +
` stable: ${out.stable.length}, ` +
`instance-shaped: ${shapeHitTotal} (${out.instanceShapes.filter((s) => s.matchedNames.length > 0).length} shapes hit), ` +
`suspect: ${out.suspect.length}\n`,
);
}
main();

View File

@@ -0,0 +1,313 @@
// Snapshot comparator.
//
// Diff semantics, in priority order:
// - removed: an element keyed in A is absent from B → drift signal.
// - changed: same key, different visible text or aria-label → drift.
// - added: new key in B → informational only (UI gained surface).
//
// Keys are stable identity tokens chosen per element class:
// - df-pill: aria-label (Chat / Cowork / Code)
// - compactPill: inner text (env value, "Select folder…", …)
// - ariaButton: aria-label (sidebar "more" buttons share labels;
// we de-dup by counting; see compareCounts below)
// - modal: headingText ?? aria-label ?? aria-labelledby
// - openMenu: items diffed by `${role}::${text}`
//
// Pure module — no I/O, no process.exit. The dispatcher reads files
// and prints; this file just produces a Diff value.
import type {
AriaButton,
CompactPillSnap,
DfPill,
MenuItem,
ModalSnap,
OpenMenu,
Snapshot,
} from './snapshot.js';
export interface DiffEntry {
kind: 'removed' | 'changed' | 'added';
category: string;
key: string;
before?: string;
after?: string;
}
export interface DiffResult {
a: { capturedAt: string; url: string; appVersion: string | null };
b: { capturedAt: string; url: string; appVersion: string | null };
entries: DiffEntry[];
summary: { removed: number; changed: number; added: number };
}
export function diff(a: Snapshot, b: Snapshot): DiffResult {
const entries: DiffEntry[] = [];
entries.push(...diffDfPills(a.dfPills, b.dfPills));
entries.push(...diffCompactPills(a.compactPills, b.compactPills));
entries.push(...diffAriaButtons(a.ariaLabeledButtons, b.ariaLabeledButtons));
entries.push(...diffModals(a.modals, b.modals));
entries.push(...diffOpenMenu(a.openMenu, b.openMenu));
const summary = entries.reduce(
(acc, e) => {
acc[e.kind] += 1;
return acc;
},
{ removed: 0, changed: 0, added: 0 },
);
return {
a: {
capturedAt: a.capturedAt,
url: a.claudeAiUrl,
appVersion: a.appVersion,
},
b: {
capturedAt: b.capturedAt,
url: b.claudeAiUrl,
appVersion: b.appVersion,
},
entries,
summary,
};
}
// Human-readable formatter. Removed/changed first (they're failures
// in spirit), added last (informational). Empty diff prints a single
// line so CI logs stay tidy.
export function formatDiff(d: DiffResult): string {
const lines: string[] = [];
lines.push(`A: ${d.a.capturedAt} (${d.a.url}) app=${d.a.appVersion}`);
lines.push(`B: ${d.b.capturedAt} (${d.b.url}) app=${d.b.appVersion}`);
lines.push('');
if (d.entries.length === 0) {
lines.push('No differences.');
return lines.join('\n');
}
const order: DiffEntry['kind'][] = ['removed', 'changed', 'added'];
for (const kind of order) {
const group = d.entries.filter((e) => e.kind === kind);
if (group.length === 0) continue;
lines.push(`# ${kind.toUpperCase()} (${group.length})`);
for (const e of group) {
if (e.kind === 'changed') {
lines.push(
` [${e.category}] ${e.key}: ${e.before ?? ''}${e.after ?? ''}`,
);
} else if (e.kind === 'removed') {
lines.push(` [${e.category}] ${e.key}: ${e.before ?? ''}`);
} else {
lines.push(` [${e.category}] ${e.key}: ${e.after ?? ''}`);
}
}
lines.push('');
}
lines.push(
`Summary: ${d.summary.removed} removed, ` +
`${d.summary.changed} changed, ${d.summary.added} added`,
);
return lines.join('\n');
}
function diffDfPills(a: DfPill[], b: DfPill[]): DiffEntry[] {
const aMap = byKey(a, (p) => p.ariaLabel ?? p.text);
const bMap = byKey(b, (p) => p.ariaLabel ?? p.text);
return compareMaps(aMap, bMap, 'dfPill', (p) => p.text);
}
function diffCompactPills(
a: CompactPillSnap[],
b: CompactPillSnap[],
): DiffEntry[] {
// Compact pills can repeat by text in pathological cases, so we
// disambiguate by appending an ordinal when needed. The ordinal is
// stable as long as DOM order is — same approach `findCompactPills`
// callers rely on.
const aMap = byKeyOrdinal(a, (p) => p.text);
const bMap = byKeyOrdinal(b, (p) => p.text);
return compareMaps(aMap, bMap, 'compactPill', (p) => `maxW=${p.maxW}`);
}
// Aria-labeled buttons frequently repeat (sidebar's ~80 conversation-row
// "more" buttons all share a label). We compare by *count per label*
// instead of per-instance: a delta in count surfaces as a single
// changed entry, which is far more readable than 80 added/removed
// rows. Per-label text is omitted since duplicate labels mean text is
// not a stable identity.
function diffAriaButtons(a: AriaButton[], b: AriaButton[]): DiffEntry[] {
return compareCounts(
countBy(a, (x) => x.ariaLabel),
countBy(b, (x) => x.ariaLabel),
'ariaButton',
);
}
function diffModals(a: ModalSnap[], b: ModalSnap[]): DiffEntry[] {
const key = (m: ModalSnap) =>
m.headingText ?? m.ariaLabel ?? m.ariaLabelledBy ?? '<unlabeled-modal>';
const aMap = byKeyOrdinal(a, key);
const bMap = byKeyOrdinal(b, key);
return compareMaps(aMap, bMap, 'modal', (m) =>
`buttons=${m.buttonLabels.join('|')}`,
);
}
// Menu diff is special: the "key" is the menu identity, but a menu
// diff is really an item-set diff. We compare item lists, scoped under
// the menu's labelledBy/ariaLabel for context.
function diffOpenMenu(
a: OpenMenu | null,
b: OpenMenu | null,
): DiffEntry[] {
if (!a && !b) return [];
const scope =
(a?.ariaLabel ?? b?.ariaLabel) ||
(a?.ariaLabelledBy ?? b?.ariaLabelledBy) ||
'<menu>';
if (a && !b) {
return [
{
kind: 'removed',
category: 'openMenu',
key: scope,
before: a.items.map(itemKey).join(' | '),
},
];
}
if (!a && b) {
return [
{
kind: 'added',
category: 'openMenu',
key: scope,
after: b.items.map(itemKey).join(' | '),
},
];
}
if (!a || !b) return [];
const aMap = byKeyOrdinal(a.items, itemKey);
const bMap = byKeyOrdinal(b.items, itemKey);
return compareMaps(
aMap,
bMap,
`openMenu[${scope}]`,
(it) =>
`disabled=${it.disabled}` +
(it.ariaChecked !== null ? ` checked=${it.ariaChecked}` : ''),
);
}
function itemKey(it: MenuItem): string {
return `${it.role}::${it.text}`;
}
function byKey<T>(arr: T[], k: (t: T) => string): Map<string, T> {
const m = new Map<string, T>();
for (const it of arr) m.set(k(it), it);
return m;
}
// When keys collide, append `#2`, `#3`, … so the comparator can still
// detect "we used to have 3, now we have 2" (one #N drops out as
// removed). Ordinals are local to this snapshot — they don't cross
// snapshot boundaries.
function byKeyOrdinal<T>(arr: T[], k: (t: T) => string): Map<string, T> {
const m = new Map<string, T>();
const counts = new Map<string, number>();
for (const it of arr) {
const base = k(it);
const n = (counts.get(base) ?? 0) + 1;
counts.set(base, n);
m.set(n === 1 ? base : `${base}#${n}`, it);
}
return m;
}
function countBy<T>(arr: T[], k: (t: T) => string): Map<string, number> {
const m = new Map<string, number>();
for (const it of arr) {
const key = k(it);
m.set(key, (m.get(key) ?? 0) + 1);
}
return m;
}
function compareMaps<T>(
a: Map<string, T>,
b: Map<string, T>,
category: string,
describe: (t: T) => string,
): DiffEntry[] {
const out: DiffEntry[] = [];
for (const [k, v] of a) {
const bv = b.get(k);
if (bv === undefined) {
out.push({
kind: 'removed',
category,
key: k,
before: describe(v),
});
continue;
}
const before = describe(v);
const after = describe(bv);
if (before !== after) {
out.push({
kind: 'changed',
category,
key: k,
before,
after,
});
}
}
for (const [k, v] of b) {
if (!a.has(k)) {
out.push({
kind: 'added',
category,
key: k,
after: describe(v),
});
}
}
return out;
}
function compareCounts(
a: Map<string, number>,
b: Map<string, number>,
category: string,
): DiffEntry[] {
const out: DiffEntry[] = [];
for (const [k, n] of a) {
const m = b.get(k);
if (m === undefined) {
out.push({
kind: 'removed',
category,
key: k,
before: `count=${n}`,
});
} else if (m !== n) {
out.push({
kind: 'changed',
category,
key: k,
before: `count=${n}`,
after: `count=${m}`,
});
}
}
for (const [k, m] of b) {
if (!a.has(k)) {
out.push({
kind: 'added',
category,
key: k,
after: `count=${m}`,
});
}
}
return out;
}

View File

@@ -0,0 +1,640 @@
// Entry point for the explore CLI.
//
// Subcommand surface (matches docs/testing/claudeai-ui-mapping-plan.md
// Phase 1):
//
// explore full snapshot to stdout
// explore pills df-pills + compact-pills + state
// explore menu currently-open menu structure
// explore snapshot <name> write to docs/testing/ui-snapshots/<name>.json
// explore diff <a> <b> diff two snapshots
// explore find <regex> search renderer for matching text/aria-label
//
// Why a hand-rolled dispatcher: the surface is six cases. A flag parser
// adds a dependency and obscures which command takes which positional.
// Keep the routing visible.
//
// Exit codes:
// 0 success (including a clean diff)
// 1 caller error (bad args, missing file)
// 2 runtime error (no debugger, no claude.ai webContents)
// 3 diff non-empty AND `--exit-on-diff` was set — opt-in, off by
// default so `explore diff` from a script can read entries
// without conflating "drift" with "tool blew up".
import {
existsSync,
mkdirSync,
readFileSync,
renameSync,
writeFileSync,
} from 'node:fs';
import { dirname, resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
import { InspectorClient } from '../src/lib/inspector.js';
import { capture, capturePills, captureOpenMenu } from './snapshot.js';
import type { Snapshot } from './snapshot.js';
import { diff, formatDiff } from './diff.js';
import { findInRenderer, formatHits } from './find.js';
import {
collapsePersistentEntries,
walkRenderer,
WALKER_VERSION,
} from './walker.js';
import type { Inventory } from './walker.js';
const INSPECTOR_PORT = 9229;
// Resolve relative to this source file so the CLI works regardless of
// cwd (npm script vs. ad-hoc tsx invocation from elsewhere).
const TESTING_DIR = resolve(
dirname(fileURLToPath(import.meta.url)),
'..',
'..',
'..',
'docs',
'testing',
);
const SNAPSHOT_DIR = resolve(TESTING_DIR, 'ui-snapshots');
const INVENTORY_PATH = resolve(TESTING_DIR, 'ui-inventory.json');
const INVENTORY_META_PATH = resolve(TESTING_DIR, 'ui-inventory.meta.json');
async function main(): Promise<void> {
const argv = process.argv.slice(2);
const cmd = argv[0];
const rest = argv.slice(1);
try {
switch (cmd) {
case undefined:
await runFullSnapshot();
return;
case 'pills':
await runPills();
return;
case 'menu':
await runMenu();
return;
case 'snapshot':
await runSnapshot(rest);
return;
case 'diff':
await runDiff(rest);
return;
case 'find':
await runFind(rest);
return;
case 'walk':
await runWalk(rest);
return;
case 'collapse':
await runCollapse(rest);
return;
case '-h':
case '--help':
case 'help':
printUsage();
return;
default:
console.error(`unknown subcommand: ${cmd}`);
printUsage();
process.exit(1);
}
} catch (err) {
const msg = err instanceof Error ? err.message : String(err);
console.error(`explore: ${msg}`);
process.exit(2);
}
}
async function runFullSnapshot(): Promise<void> {
const client = await connect();
try {
const snap = await capture(client);
console.log(JSON.stringify(snap, null, 2));
} finally {
client.close();
}
}
async function runPills(): Promise<void> {
const client = await connect();
try {
const pills = await capturePills(client);
console.log(JSON.stringify(pills, null, 2));
} finally {
client.close();
}
}
async function runMenu(): Promise<void> {
const client = await connect();
try {
const menu = await captureOpenMenu(client);
if (!menu) {
console.log('null');
return;
}
console.log(JSON.stringify(menu, null, 2));
} finally {
client.close();
}
}
async function runSnapshot(args: string[]): Promise<void> {
const name = args[0];
if (!name) {
console.error('snapshot: missing <name> argument');
console.error('usage: explore snapshot <name>');
process.exit(1);
}
if (!/^[a-zA-Z0-9._-]+$/.test(name)) {
console.error(
`snapshot: name ${JSON.stringify(name)} contains characters ` +
`outside [a-zA-Z0-9._-] — choose a slug-safe name`,
);
process.exit(1);
}
const client = await connect();
let snap: Snapshot;
try {
snap = await capture(client);
} finally {
client.close();
}
if (!existsSync(SNAPSHOT_DIR)) {
mkdirSync(SNAPSHOT_DIR, { recursive: true });
}
const outPath = resolve(SNAPSHOT_DIR, `${name}.json`);
writeFileSync(outPath, JSON.stringify(snap, null, 2) + '\n', 'utf8');
console.log(`wrote ${outPath}`);
}
async function runDiff(args: string[]): Promise<void> {
const opts = { json: false, exitOnDiff: false };
const positional: string[] = [];
for (const a of args) {
if (a === '--json') opts.json = true;
else if (a === '--exit-on-diff') opts.exitOnDiff = true;
else positional.push(a);
}
if (positional.length !== 2) {
console.error('diff: expected exactly two snapshot names or paths');
console.error('usage: explore diff <a> <b> [--json] [--exit-on-diff]');
process.exit(1);
}
const a = readSnapshot(positional[0]!);
const b = readSnapshot(positional[1]!);
const result = diff(a, b);
if (opts.json) {
console.log(JSON.stringify(result, null, 2));
} else {
console.log(formatDiff(result));
}
if (opts.exitOnDiff && result.entries.length > 0) {
process.exit(3);
}
}
// `walk` parses its own flags; --max-elements 0 prints usage and exits
// (a cheap dry-run for "is the CLI loadable" without touching CDP).
async function runWalk(args: string[]): Promise<void> {
const opts: {
maxElements: number;
maxDrillsPerSurface: number;
checkpointEvery: number;
allowlist: string | null;
output: string;
verbose: boolean;
help: boolean;
} = {
maxElements: 1000,
maxDrillsPerSurface: 50,
checkpointEvery: 100,
allowlist: null,
output: INVENTORY_PATH,
verbose: false,
help: false,
};
for (let i = 0; i < args.length; i += 1) {
const a = args[i]!;
if (a === '-h' || a === '--help') {
opts.help = true;
} else if (a === '--max-elements') {
const n = Number(args[i + 1]);
if (!Number.isFinite(n) || n < 0) {
console.error('walk: --max-elements requires a non-negative integer');
process.exit(1);
}
opts.maxElements = n;
i += 1;
} else if (a === '--checkpoint-every') {
const n = Number(args[i + 1]);
if (!Number.isFinite(n) || n < 0 || !Number.isInteger(n)) {
console.error(
'walk: --checkpoint-every requires a non-negative integer (0 disables)',
);
process.exit(1);
}
opts.checkpointEvery = n;
i += 1;
} else if (
a === '--max-drills-per-surface' ||
a === '--max-elements-per-surface'
) {
// v4 renamed the flag from --max-elements-per-surface (which
// truncated emissions) to --max-drills-per-surface (which only
// caps queue pushes; all entries are still emitted). Keep the
// old name as a deprecated alias.
if (a === '--max-elements-per-surface') {
process.stderr.write(
'walk: --max-elements-per-surface is deprecated; ' +
'use --max-drills-per-surface (semantics changed: now ' +
'caps drilling fan-out, not emission count)\n',
);
}
const n = Number(args[i + 1]);
if (!Number.isFinite(n) || n < 0) {
console.error(`walk: ${a} requires a non-negative integer`);
process.exit(1);
}
opts.maxDrillsPerSurface = n;
i += 1;
} else if (a === '--allowlist') {
const p = args[i + 1];
if (!p) {
console.error('walk: --allowlist requires a path');
process.exit(1);
}
opts.allowlist = p;
i += 1;
} else if (a === '--output') {
const p = args[i + 1];
if (!p) {
console.error('walk: --output requires a path');
process.exit(1);
}
opts.output = resolve(p);
i += 1;
} else if (a === '--verbose') {
opts.verbose = true;
} else {
console.error(`walk: unknown argument: ${a}`);
printWalkUsage();
process.exit(1);
}
}
if (opts.help || opts.maxElements === 0) {
printWalkUsage();
return;
}
let allowlist: string[] = [];
if (opts.allowlist) {
const raw = readFileSync(opts.allowlist, 'utf8');
try {
const parsed = JSON.parse(raw) as { exemptions?: string[] };
allowlist = parsed.exemptions ?? [];
} catch (err) {
const msg = err instanceof Error ? err.message : String(err);
console.error(`walk: allowlist ${opts.allowlist}: invalid JSON — ${msg}`);
process.exit(1);
}
}
const outDir = dirname(opts.output);
if (!existsSync(outDir)) mkdirSync(outDir, { recursive: true });
const metaPath =
opts.output === INVENTORY_PATH
? INVENTORY_META_PATH
: opts.output.replace(/\.json$/, '.meta.json');
// Atomic writer: write to <path>.tmp, then rename. Survives a kill
// between writes — readers always see either the prior complete file
// or the new one, never a half-written buffer. Used for both the
// in-flight checkpoint writes and the final write. `partial` is
// recorded in meta.json (true on intermediate writes, false on the
// final write) so downstream readers can tell whether the inventory
// is complete; the inventory file itself stays shape-compatible.
const writeCheckpoint = (
inventory: Inventory,
isPartial: boolean,
): void => {
const invTmp = `${opts.output}${INVENTORY_TMP_SUFFIX}`;
writeFileSync(
invTmp,
JSON.stringify(inventory, null, 2) + '\n',
'utf8',
);
renameSync(invTmp, opts.output);
const meta = {
capturedAt: inventory.capturedAt,
appVersion: inventory.appVersion,
walkerVersion: WALKER_VERSION,
startUrl: inventory.startUrl,
totalElements: inventory.totalElements,
deniedActions: inventory.deniedActions,
partial: isPartial,
denylistDescription:
'Default destructive-action labels (see DEFAULT_DENYLIST in walker.ts) ' +
'plus optional allowlist exemptions.',
allowlistEntries: allowlist,
};
const metaTmp = `${metaPath}${INVENTORY_TMP_SUFFIX}`;
writeFileSync(metaTmp, JSON.stringify(meta, null, 2) + '\n', 'utf8');
renameSync(metaTmp, metaPath);
};
const client = await connect();
let inventory: Inventory;
try {
inventory = await walkRenderer(client, {
maxElements: opts.maxElements,
maxDrillsPerSurface: opts.maxDrillsPerSurface,
allowlist,
verbose: opts.verbose,
checkpointEvery: opts.checkpointEvery,
checkpointWriter:
opts.checkpointEvery > 0
? (inv) => writeCheckpoint(inv, true)
: undefined,
});
} finally {
client.close();
}
writeCheckpoint(inventory, false);
console.log(
`wrote ${opts.output} (${inventory.totalElements} entries, ` +
`${inventory.deniedActions} denylisted)`,
);
console.log(`wrote ${metaPath}`);
}
// Suffix used by the atomic-write helper. Kept module-level so any
// future readers know which dotfile to ignore in tooling/gitignore.
const INVENTORY_TMP_SUFFIX = '.tmp';
// `collapse [<path>]` re-runs the post-walk persistent-element
// collapse against an existing inventory file. Use case: a partial
// checkpoint (walker aborted mid-walk) skipped the in-loop collapse
// and so has 0 persistent entries — this command salvages it without
// re-running the walker. Also useful if collapse heuristics change
// and we want to refresh an existing inventory.
async function runCollapse(args: string[]): Promise<void> {
let path = INVENTORY_PATH;
let help = false;
for (let i = 0; i < args.length; i += 1) {
const a = args[i]!;
if (a === '-h' || a === '--help') help = true;
else if (!a.startsWith('-')) path = resolve(a);
else {
console.error(`collapse: unknown argument: ${a}`);
printCollapseUsage();
process.exit(1);
}
}
if (help) {
printCollapseUsage();
return;
}
if (!existsSync(path)) {
console.error(`collapse: inventory not found: ${path}`);
process.exit(1);
}
let inventory: Inventory;
try {
inventory = JSON.parse(readFileSync(path, 'utf8')) as Inventory;
} catch (err) {
const msg = err instanceof Error ? err.message : String(err);
console.error(`collapse: invalid JSON in ${path}${msg}`);
process.exit(1);
}
// v7-only gate. The v6 → v7 fingerprint cutover invalidated all
// older inventory shapes; re-running the persistent collapse on a
// v6 inventory would mint v7-key collisions against v6 selectors
// and drop unrelated entries. Re-walk first.
const wv = inventory.walkerVersion;
if (wv !== '7') {
console.error(
`collapse: walkerVersion ${wv} is not supported (need v7; ` +
`re-walk after the v6 → v7 fingerprint cutover)`,
);
process.exit(1);
}
const before = inventory.entries.length;
const result = collapsePersistentEntries(inventory.entries);
const after = result.entries.length;
const dropped = before - after;
const collapsedAt = new Date().toISOString();
const updated: Inventory = {
...inventory,
walkerVersion: WALKER_VERSION,
totalElements: after,
entries: result.entries,
capturedAt: inventory.capturedAt,
};
// Atomic write inventory + meta. Mirror the walk subcommand: write
// to .tmp, rename. Meta gets `partial: false` (collapse closes out
// a partial checkpoint) and `collapsedAt`; everything else carries
// through from the existing meta where present.
const invTmp = `${path}${INVENTORY_TMP_SUFFIX}`;
writeFileSync(invTmp, JSON.stringify(updated, null, 2) + '\n', 'utf8');
renameSync(invTmp, path);
const metaPath =
path === INVENTORY_PATH
? INVENTORY_META_PATH
: path.replace(/\.json$/, '.meta.json');
let existingMeta: Record<string, unknown> = {};
if (existsSync(metaPath)) {
try {
existingMeta = JSON.parse(readFileSync(metaPath, 'utf8')) as Record<
string,
unknown
>;
} catch {
// Carry the inventory through even if meta is malformed; meta
// is recoverable, the entries are not.
}
}
const meta = {
...existingMeta,
capturedAt: updated.capturedAt,
appVersion: updated.appVersion,
walkerVersion: WALKER_VERSION,
startUrl: updated.startUrl,
totalElements: updated.totalElements,
deniedActions: updated.deniedActions,
partial: false,
collapsedAt,
};
const metaTmp = `${metaPath}${INVENTORY_TMP_SUFFIX}`;
writeFileSync(metaTmp, JSON.stringify(meta, null, 2) + '\n', 'utf8');
renameSync(metaTmp, metaPath);
console.log(
`collapse: read ${before} entries → wrote ${after} entries ` +
`(${dropped} dropped via persistent collapse, ` +
`${result.persistentSurvivors} shells emitted)`,
);
console.log(`wrote ${path}`);
console.log(`wrote ${metaPath}`);
}
function printCollapseUsage(): void {
console.log(
[
'usage: explore collapse [<path>]',
'',
'Re-run the post-walk persistent-element collapse against an',
'existing inventory file. Useful for salvaging a partial',
'checkpoint that aborted before the in-loop collapse step.',
'',
' <path> inventory file to collapse in place (default:',
' docs/testing/ui-inventory.json). Must be v5+.',
' -h, --help print this help',
'',
'Writes the collapsed inventory and updated meta.json',
'atomically (.tmp + rename). Meta gains `collapsedAt` and',
'clears `partial` to false.',
].join('\n'),
);
}
function printWalkUsage(): void {
console.log(
[
'usage: explore walk [options]',
'',
'options:',
' --max-elements N safety cap on total entries',
' (default 1000; 0 prints this help',
' and exits)',
' --max-drills-per-surface N max number of children to drill into',
' from one surface (default 50). All',
' children are still emitted to the',
' inventory; this only bounds the BFS',
' queue fan-out per surface.',
' (Alias: --max-elements-per-surface,',
' deprecated — v3 truncated emissions,',
' v4 only caps drilling.)',
' --checkpoint-every N atomically write the inventory every N',
' newly-emitted entries (default 100;',
' 0 disables). Intermediate writes set',
' meta.json `partial: true`; the final',
' write clears it to false.',
' --allowlist PATH JSON file:',
' {"exemptions": ["entry.id", ...]} to',
' remove from the default denylist',
' --output PATH write inventory to PATH (default',
' docs/testing/ui-inventory.json)',
' --verbose log every click + surface to stderr',
' -h, --help print this help',
].join('\n'),
);
}
async function runFind(args: string[]): Promise<void> {
const opts = { json: false, limit: 100 };
const positional: string[] = [];
for (let i = 0; i < args.length; i += 1) {
const a = args[i]!;
if (a === '--json') opts.json = true;
else if (a === '--limit') {
const n = Number(args[i + 1]);
if (!Number.isFinite(n) || n <= 0) {
console.error('find: --limit requires a positive integer');
process.exit(1);
}
opts.limit = n;
i += 1;
} else positional.push(a);
}
const pat = positional[0];
if (!pat) {
console.error('find: missing <regex> argument');
console.error('usage: explore find <regex> [--json] [--limit N]');
process.exit(1);
}
let re: RegExp;
try {
re = new RegExp(pat, 'i');
} catch (err) {
const msg = err instanceof Error ? err.message : String(err);
console.error(`find: invalid regex: ${msg}`);
process.exit(1);
}
const client = await connect();
try {
const hits = await findInRenderer(client, re, { limit: opts.limit });
if (opts.json) {
console.log(JSON.stringify(hits, null, 2));
} else {
console.log(formatHits(hits));
}
} finally {
client.close();
}
}
// Snapshot resolver: accept either a bare name (looked up in the
// snapshot dir, .json appended) or an explicit path. Bare names are
// the common case from CI / the README; explicit paths help when
// diffing a snapshot against an out-of-tree fixture.
function readSnapshot(nameOrPath: string): Snapshot {
const candidates = [
nameOrPath,
resolve(SNAPSHOT_DIR, nameOrPath),
resolve(SNAPSHOT_DIR, `${nameOrPath}.json`),
];
const found = candidates.find((p) => existsSync(p));
if (!found) {
console.error(`snapshot not found: tried ${candidates.join(', ')}`);
process.exit(1);
}
const raw = readFileSync(found, 'utf8');
try {
return JSON.parse(raw) as Snapshot;
} catch (err) {
const msg = err instanceof Error ? err.message : String(err);
console.error(`snapshot ${found}: invalid JSON — ${msg}`);
process.exit(1);
}
}
async function connect(): Promise<InspectorClient> {
try {
return await InspectorClient.connect(INSPECTOR_PORT);
} catch (err) {
const msg = err instanceof Error ? err.message : String(err);
throw new Error(
`could not attach to debugger on :${INSPECTOR_PORT}${msg}. ` +
`Enable the main-process debugger via the in-app menu first.`,
);
}
}
function printUsage(): void {
console.log(
[
'usage:',
' explore full snapshot to stdout',
' explore pills df-pills + compact-pills + state',
' explore menu currently-open menu structure',
' explore snapshot <name> write snapshot to ui-snapshots/<name>.json',
' explore diff <a> <b> [--json] [--exit-on-diff]',
' compare two snapshots',
' explore find <regex> [--json] [--limit N]',
' search renderer text + aria-label',
' explore walk [options] BFS walker → docs/testing/ui-inventory.json',
' (see `explore walk --help` for options)',
' explore collapse [<path>] re-run persistent-element collapse against',
' an existing inventory (salvages partial',
' checkpoints; see `explore collapse --help`)',
].join('\n'),
);
}
main().catch((err) => {
const msg = err instanceof Error ? err.message : String(err);
console.error(`explore: ${msg}`);
process.exit(2);
});

View File

@@ -0,0 +1,86 @@
// Renderer search by regex over text content + aria-label.
//
// Why text+aria together: a "Send" button might have aria-label="Send"
// but textContent="" (icon child); a heading might be the inverse.
// Searching both lets the human ask "where does the word X appear?"
// without first guessing which surface labels it.
//
// We restrict the candidate set to interactive + landmark elements
// (button, [role], a, h1-h6, [aria-label]) rather than walking the
// entire document — claude.ai's chat history dumps thousands of
// <span>/<p> nodes that swamp signal. If a future need wants the
// broader sweep, add a `--all` flag here rather than expanding the
// default.
import type { InspectorClient } from '../src/lib/inspector.js';
export interface FindHit {
tag: string;
role: string | null;
ariaLabel: string | null;
text: string;
matchedField: 'text' | 'ariaLabel' | 'both';
visible: boolean;
}
// Regex source + flags travel as JSON strings into the renderer eval —
// same encoding pattern as openPill / clickMenuItem in lib/claudeai.ts.
export async function findInRenderer(
client: InspectorClient,
pattern: RegExp,
opts: { limit?: number } = {},
): Promise<FindHit[]> {
const limit = opts.limit ?? 100;
const reSrc = JSON.stringify(pattern.source);
const reFlags = JSON.stringify(pattern.flags);
return await client.evalInRenderer<FindHit[]>(
'claude.ai',
`(() => {
const re = new RegExp(${reSrc}, ${reFlags});
const sel = 'button, a, h1, h2, h3, h4, h5, h6, ' +
'[role], [aria-label]';
const nodes = Array.from(document.querySelectorAll(sel));
const hits = [];
for (const el of nodes) {
const text = (el.textContent || '').trim().slice(0, 200);
const aria = el.getAttribute('aria-label');
const textHit = text.length > 0 && re.test(text);
const ariaHit = aria !== null && re.test(aria);
if (!textHit && !ariaHit) continue;
hits.push({
tag: el.tagName.toLowerCase(),
role: el.getAttribute('role'),
ariaLabel: aria,
text,
matchedField: textHit && ariaHit
? 'both'
: (textHit ? 'text' : 'ariaLabel'),
visible: !!el.getClientRects().length,
});
if (hits.length >= ${limit}) break;
}
return hits;
})()`,
);
}
export function formatHits(hits: FindHit[]): string {
if (hits.length === 0) return 'No matches.';
const lines: string[] = [];
for (const h of hits) {
const vis = h.visible ? '' : ' [hidden]';
const role = h.role ? ` role=${h.role}` : '';
const aria = h.ariaLabel !== null ? ` aria-label=${q(h.ariaLabel)}` : '';
lines.push(
`${h.tag}${role}${aria} (${h.matchedField})${vis}` +
(h.text ? `\n text: ${h.text}` : ''),
);
}
lines.push('');
lines.push(`${hits.length} match(es).`);
return lines.join('\n');
}
function q(s: string): string {
return JSON.stringify(s);
}

View File

@@ -0,0 +1,523 @@
// Generate the U01 UI-visibility Playwright spec from the captured
// inventory at docs/testing/ui-inventory.json. Reads the inventory +
// its meta sidecar offline (no live app needed), groups entries by
// canonical surface, and emits a single .spec.ts file with one
// `test()` per inventory entry under one `test.describe()` per
// surface.
//
// The generated spec asserts each entry's recorded fingerprint still
// resolves to a visible element on the live signed-in renderer. It's
// the inventory's "do these things still render" sibling — H05
// detects shape drift across snapshots, U01 detects per-entry render
// failures across the whole inventory.
//
// Pure file in/out: no network, no inspector. The spec it emits is
// where the live app gets touched. Run via `npm run gen:render-specs`.
//
// Refuses to operate on a stale walker version or a partial inventory
// — generating a passing spec from a half-walked DOM would silently
// shrink the assertion surface to whatever the walker happened to
// reach before crashing.
import {
existsSync,
readFileSync,
renameSync,
writeFileSync,
} from 'node:fs';
import { dirname, resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
import { WALKER_VERSION } from './walker.js';
import type { Inventory, InventoryEntry, NavStep } from './walker.js';
const HERE = dirname(fileURLToPath(import.meta.url));
const TESTING_DIR = resolve(HERE, '..', '..', '..', 'docs', 'testing');
const DEFAULT_INVENTORY = resolve(TESTING_DIR, 'ui-inventory.json');
const DEFAULT_META = resolve(TESTING_DIR, 'ui-inventory.meta.json');
const DEFAULT_OUTPUT = resolve(
HERE,
'..',
'src',
'runners',
'U01_ui_visibility.spec.ts',
);
interface MetaSidecar {
walkerVersion: string;
partial: boolean;
capturedAt: string;
appVersion: string;
}
interface CliOpts {
inventory: string;
output: string;
help: boolean;
}
function parseCli(argv: string[]): CliOpts {
const opts: CliOpts = {
inventory: DEFAULT_INVENTORY,
output: DEFAULT_OUTPUT,
help: false,
};
for (let i = 0; i < argv.length; i += 1) {
const a = argv[i]!;
switch (a) {
case '-h':
case '--help':
opts.help = true;
break;
case '--inventory': {
const v = argv[++i];
if (!v) {
process.stderr.write('--inventory requires a path\n');
process.exit(1);
}
opts.inventory = resolve(v);
break;
}
case '--output': {
const v = argv[++i];
if (!v) {
process.stderr.write('--output requires a path\n');
process.exit(1);
}
opts.output = resolve(v);
break;
}
default:
process.stderr.write(`gen-render-specs: unknown argument: ${a}\n`);
printUsage();
process.exit(1);
}
}
return opts;
}
function printUsage(): void {
process.stdout.write(
'Usage: tsx explore/gen-render-specs.ts [options]\n' +
'\n' +
'Generates src/runners/U01_ui_visibility.spec.ts from\n' +
'docs/testing/ui-inventory.json. Refuses to run if the inventory\n' +
'is partial or was produced by a walker older than v' +
WALKER_VERSION +
'.\n' +
'\n' +
'Options:\n' +
' --inventory <path> Override default inventory path\n' +
' (default: docs/testing/ui-inventory.json)\n' +
' --output <path> Override default spec output path\n' +
' (default: src/runners/U01_ui_visibility.spec.ts)\n' +
' -h, --help Print this help and exit\n',
);
}
function loadInventory(path: string): Inventory {
if (!existsSync(path)) {
process.stderr.write(`gen-render-specs: inventory not found: ${path}\n`);
process.exit(1);
}
try {
return JSON.parse(readFileSync(path, 'utf8')) as Inventory;
} catch (err) {
const msg = err instanceof Error ? err.message : String(err);
process.stderr.write(`gen-render-specs: failed to parse inventory: ${msg}\n`);
process.exit(1);
}
}
function loadMeta(invPath: string): MetaSidecar {
const metaPath = invPath.replace(/\.json$/, '.meta.json');
const fallbackPath =
invPath === DEFAULT_INVENTORY ? DEFAULT_META : metaPath;
const path = existsSync(metaPath) ? metaPath : fallbackPath;
if (!existsSync(path)) {
process.stderr.write(
`gen-render-specs: meta sidecar not found at ${metaPath} ` +
'(needed for partial/walkerVersion gating)\n',
);
process.exit(1);
}
try {
return JSON.parse(readFileSync(path, 'utf8')) as MetaSidecar;
} catch (err) {
const msg = err instanceof Error ? err.message : String(err);
process.stderr.write(`gen-render-specs: failed to parse meta: ${msg}\n`);
process.exit(1);
}
}
// Refuse on stale walker versions or partial inventories. The point of
// this generator is to emit a spec that asserts the FULL inventory
// renders; gating on these two flags is what stops a half-walked
// checkpoint from quietly shrinking the assertion set.
function validate(inv: Inventory, meta: MetaSidecar): void {
const seen = Number.parseInt(inv.walkerVersion, 10);
const required = Number.parseInt(WALKER_VERSION, 10);
if (Number.isNaN(seen) || seen < required) {
process.stderr.write(
`gen-render-specs: walkerVersion ${inv.walkerVersion} < ${WALKER_VERSION}; ` +
'inventory shape may be incompatible. Re-walk with the current ' +
'explore CLI before regenerating the spec.\n',
);
process.exit(1);
}
if (meta.partial === true) {
process.stderr.write(
'gen-render-specs: inventory meta reports partial=true (walk did ' +
'not finish). Refusing to generate a spec from a half-walked DOM ' +
'— complete the walk first or pass --inventory to a known-good file.\n',
);
process.exit(1);
}
}
// Deterministic surface→entries grouping. Sort surfaces alphabetically
// and entries within each surface by id, so a re-run produces an
// identical spec file when the inventory hasn't changed (the file is
// checked in; no-op regeneration shouldn't mint diffs).
function groupBySurface(
entries: InventoryEntry[],
): { surface: string; entries: InventoryEntry[] }[] {
const buckets = new Map<string, InventoryEntry[]>();
for (const e of entries) {
const list = buckets.get(e.surface) ?? [];
list.push(e);
buckets.set(e.surface, list);
}
const surfaces = [...buckets.keys()].sort();
return surfaces.map((surface) => {
const list = buckets.get(surface)!.slice();
list.sort((a, b) => (a.id < b.id ? -1 : a.id > b.id ? 1 : 0));
return { surface, entries: list };
});
}
// Strip any navigationPath step that would CLICK the entry under
// test, when that entry is denylisted. Per the spec brief: never click
// denylisted controls, just assert they exist. In practice the
// recorded path's last click is the surface-opener (entry's own id is
// `surface.role.label`, distinct from any path step), so this filter
// usually no-ops — but it's the safety net the brief calls for.
function safeNavigationPath(entry: InventoryEntry): NavStep[] {
if (!entry.denylisted) return entry.navigationPath;
return entry.navigationPath.filter(
(s) => !(s.action === 'click' && s.id === entry.id),
);
}
// JS string literal for embedding in generated source. Use JSON.stringify
// — handles all the escapes (backslash, quotes, newlines, unicode) that
// hand-rolling would miss on entries with weird labels.
function js(value: unknown): string {
return JSON.stringify(value);
}
// Sanitize a surface name into a `test.describe()` block label that
// reads cleanly. Surfaces are dot-separated paths like
// `root.button.search.option.x`; the raw form is fine for grouping
// but we annotate the count so the report shows scope at a glance.
function describeLabel(surface: string, count: number): string {
return `surface: ${surface} (${count} ${count === 1 ? 'entry' : 'entries'})`;
}
function testTitle(entry: InventoryEntry): string {
const tags: string[] = [entry.kind];
if (entry.denylisted) tags.push('denylist');
const tagStr = tags.length ? ` [${tags.join(',')}]` : '';
return `${entry.id}${tagStr}${entry.role}: ${entry.label}`;
}
function generateSpec(
inv: Inventory,
meta: MetaSidecar,
groups: { surface: string; entries: InventoryEntry[] }[],
): string {
const out: string[] = [];
out.push(
'// AUTO-GENERATED FROM docs/testing/ui-inventory.json',
'// DO NOT EDIT — regenerate with `npm run gen:render-specs`',
`// Source inventory: walker v${inv.walkerVersion} (account-portable ariaPath ` +
`fingerprints), captured ${inv.capturedAt}, app ${inv.appVersion}`,
`// Entries: ${inv.totalElements} ` +
`(${inv.deniedActions} denylisted), ` +
`${groups.length} surfaces`,
`// Meta: partial=${meta.partial}`,
'',
"import { test, expect } from '@playwright/test';",
'',
"import { launchClaude } from '../lib/electron.js';",
"import type { ClaudeApp } from '../lib/electron.js';",
"import { createIsolation } from '../lib/isolation.js';",
"import { InspectorClient } from '../lib/inspector.js';",
"import { captureSessionEnv } from '../lib/diagnostics.js';",
'import {',
'\tcurrentUrl,',
'\tfindByFingerprint,',
'\tredrivePath,',
'\twaitForStable,',
"} from '../../explore/walker.js';",
"import type { InventoryEntry } from '../../explore/walker.js';",
'',
'// U01 — UI visibility sweep.',
'//',
'// One Playwright test per inventory entry. Each test re-drives the',
"// entry's recorded navigationPath against the live signed-in",
"// renderer, then asserts the entry's fingerprint resolves to a",
'// visible element. The full inventory acts as a render contract:',
'// any entry that no longer renders (selector drift, route change,',
'// permission change) shows up as exactly one failed test, with the',
'// triage payload (entry JSON + observed DOM neighbourhood)',
'// attached to that test only.',
'//',
'// Skip semantics mirror H05: the suite skips cleanly if the host',
"// isn't signed in (claude.ai webContents never reaches the",
"// userLoaded level). Default path: kill any running host Claude,",
"// copy the auth-relevant subset of ~/.config/Claude into a",
"// hermetic tmpdir, and launch against that copy. Host config is",
"// left untouched after the kill+seed. CLAUDE_TEST_USE_HOST_CONFIG=1",
"// opts out and shares the host's actual config directory (no",
"// kill+seed) — use only when you've manually closed the host first.",
'//',
"// Denylisted entries: we still assert they render, but the",
"// generator strips any navigationPath step that would CLICK the",
'// denylisted entry itself. Per the spec brief: never trigger',
'// destructive controls from a render check.',
'//',
'// Persistent entries: each persistent entry is asserted on its',
'// canonical surface only (the `surface` field). The cross-surface',
'// `surfaces[]` list is intentionally unused here — a strict',
'// "renders on every surface it was observed" mode is a future',
'// follow-up.',
'//',
'// Instance entries: assert that AT LEAST ONE element matching the',
"// fingerprint exists. We don't assert the recorded instanceCount",
'// — list lengths legitimately fluctuate across sessions.',
'',
"// Per-test budget covers a path redrive (~1 nav + ~N clicks * 1.5s)",
'// plus a fingerprint resolve. Generous to ride out a slow first',
'// route load; later tests in the same suite reuse the warmed app.',
'test.setTimeout(120_000);',
'',
'const useHostConfig = process.env.CLAUDE_TEST_USE_HOST_CONFIG === \'1\';',
'',
"// Single shared launch + inspector across the whole suite. N",
'// tests at one launch each would burn 30+ minutes on cold-start',
'// alone. We pay for setup once, then each test re-drives from the',
'// recorded startUrl so prior-test side effects (open menus, route',
'// changes) get reset before the next assertion runs.',
'let app: ClaudeApp | null = null;',
'let sharedInspector: InspectorClient | null = null;',
'let sharedStartUrl: string | null = null;',
'let suiteSkipReason: string | null = null;',
'',
"test.describe('U01 — UI visibility sweep (auto-generated)', () => {",
'\ttest.beforeAll(async () => {',
'\t\t// Default path: kill any host Claude, copy auth-relevant',
"\t\t// subset of ~/.config/Claude into a hermetic tmpdir, launch",
"\t\t// against that copy. Host config is left untouched after the",
"\t\t// kill+seed. CLAUDE_TEST_USE_HOST_CONFIG=1 opts out — shares",
"\t\t// the host's actual config directory (no kill+seed); use only",
"\t\t// when you've manually closed the host first.",
'\t\tif (useHostConfig) {',
'\t\t\tapp = await launchClaude({ isolation: null });',
'\t\t} else {',
'\t\t\tconst seeded = await createIsolation({ seedFromHost: true });',
'\t\t\tapp = await launchClaude({ isolation: seeded });',
'\t\t}',
"\t\tconst ready = await app.waitForReady('userLoaded');",
'\t\tif (!ready.postLoginUrl) {',
"\t\t\tsuiteSkipReason = 'claude.ai never reached a post-login URL — host ' +",
"\t\t\t\t'profile is not signed in. Sign in via the host app first.';",
'\t\t\treturn;',
'\t\t}',
'\t\tsharedInspector = ready.inspector;',
'\t\tsharedStartUrl = await currentUrl(sharedInspector);',
'\t\tawait waitForStable(sharedInspector);',
'\t});',
'',
'\ttest.afterAll(async () => {',
'\t\tif (sharedInspector) {',
'\t\t\ttry {',
'\t\t\t\tsharedInspector.close();',
'\t\t\t} catch {',
'\t\t\t\t// inspector may already be closed by app.close()',
'\t\t\t}',
'\t\t\tsharedInspector = null;',
'\t\t}',
'\t\tif (app) {',
'\t\t\tawait app.close();',
'\t\t\tapp = null;',
'\t\t}',
'\t});',
'',
'\t// why: shared per-test runner. Each generated `test()` packs the',
'\t// entry as a literal and calls this — keeps the file scannable',
'\t// (one block per entry) without duplicating the assertion logic',
"\t// 383 times. Throws on its own when the suite was skipped so",
"\t// each test's status reflects the actual render check, not a",
'\t// mis-attributed setup failure.',
'\tasync function runEntry(',
'\t\tentry: InventoryEntry,',
"\t\ttestInfo: import('@playwright/test').TestInfo,",
'\t): Promise<void> {',
'\t\tif (suiteSkipReason) {',
'\t\t\ttestInfo.skip(true, suiteSkipReason);',
'\t\t\treturn;',
'\t\t}',
'\t\tif (!sharedInspector || !sharedStartUrl) {',
'\t\t\tthrow new Error(',
"\t\t\t\t'U01: beforeAll did not initialize the inspector — check the ' +",
"\t\t\t\t\t'session-env attachment for the launch failure.',",
'\t\t\t);',
'\t\t}',
"\t\ttestInfo.annotations.push({ type: 'severity', description: 'Should' });",
'\t\ttestInfo.annotations.push({',
"\t\t\ttype: 'surface',",
'\t\t\tdescription: entry.surface,',
'\t\t});',
'\t\ttestInfo.annotations.push({',
"\t\t\ttype: 'kind',",
'\t\t\tdescription: entry.kind,',
'\t\t});',
'',
'\t\ttry {',
'\t\t\tawait redrivePath(sharedInspector, sharedStartUrl, entry.navigationPath);',
'\t\t} catch (err) {',
'\t\t\tconst msg = err instanceof Error ? err.message : String(err);',
"\t\t\tawait testInfo.attach('redrive-failure', {",
'\t\t\t\tbody: JSON.stringify(',
'\t\t\t\t\t{',
'\t\t\t\t\t\tentry,',
'\t\t\t\t\t\terror: msg,',
'\t\t\t\t\t\tnote:',
"\t\t\t\t\t\t\t'redrivePath threw before we could assert visibility — ' +",
"\t\t\t\t\t\t\t'usually a stale fingerprint along the path. Re-walk the ' +",
"\t\t\t\t\t\t\t'inventory and regenerate.',",
'\t\t\t\t\t},',
'\t\t\t\t\tnull,',
'\t\t\t\t\t2,',
'\t\t\t\t),',
"\t\t\t\tcontentType: 'application/json',",
'\t\t\t});',
'\t\t\tthrow err;',
'\t\t}',
'\t\tawait waitForStable(sharedInspector);',
'',
'\t\tconst result = await findByFingerprint(',
'\t\t\tsharedInspector,',
'\t\t\tentry.fingerprint,',
'\t\t\tentry.kind,',
'\t\t);',
'\t\tif (!result.found) {',
"\t\t\tawait testInfo.attach('fingerprint-miss', {",
'\t\t\t\tbody: JSON.stringify(',
'\t\t\t\t\t{',
'\t\t\t\t\t\tentry,',
'\t\t\t\t\t\treason: result.reason,',
'\t\t\t\t\t\tobservedOuterHTML: result.outerHTMLSnippet,',
'\t\t\t\t\t},',
'\t\t\t\t\tnull,',
'\t\t\t\t\t2,',
'\t\t\t\t),',
"\t\t\t\tcontentType: 'application/json',",
'\t\t\t});',
'\t\t}',
"\t\t// Soft drift: primary aria-tree match failed but a relaxed-",
"\t\t// scope fallback recovered. Test still passes — but a",
"\t\t// drift-warning attachment surfaces it so the sweep summary",
"\t\t// can flag re-walk before drift compounds.",
'\t\tif (result.found && result.drift) {',
"\t\t\tawait testInfo.attach('drift-warning', {",
'\t\t\t\tbody: JSON.stringify(',
'\t\t\t\t\t{',
'\t\t\t\t\t\tentryId: entry.id,',
'\t\t\t\t\t\texpected: entry.fingerprint.ariaPath,',
'\t\t\t\t\t\tmatchedVia: result.strategy,',
'\t\t\t\t\t\tdrift: result.drift,',
"\t\t\t\t\t\tnote:",
"\t\t\t\t\t\t\t'primary aria-tree match failed; recovered via fallback. ' +",
"\t\t\t\t\t\t\t'Re-walk inventory before drift compounds.',",
'\t\t\t\t\t},',
'\t\t\t\t\tnull,',
'\t\t\t\t\t2,',
'\t\t\t\t),',
"\t\t\t\tcontentType: 'application/json',",
'\t\t\t});',
"\t\t\ttestInfo.annotations.push({",
"\t\t\t\ttype: 'drift',",
'\t\t\t\tdescription: result.strategy ?? \'unknown\',',
'\t\t\t});',
'\t\t}',
'\t\texpect(',
'\t\t\tresult.found,',
'\t\t\t`fingerprint did not resolve: ${result.reason ?? \'unknown\'}`,',
'\t\t).toBe(true);',
'\t}',
'',
'\ttest.beforeAll(async ({}, testInfo) => {',
"\t\tawait testInfo.attach('session-env', {",
'\t\t\tbody: JSON.stringify(captureSessionEnv(), null, 2),',
"\t\t\tcontentType: 'application/json',",
'\t\t});',
'\t});',
'',
);
// One describe per surface, one test per entry. Strings are
// JSON-encoded so labels with quotes/backticks/unicode survive.
for (const group of groups) {
out.push(
`\ttest.describe(${js(describeLabel(group.surface, group.entries.length))}, () => {`,
);
for (const entry of group.entries) {
const safe: InventoryEntry = {
...entry,
navigationPath: safeNavigationPath(entry),
};
out.push(
`\t\ttest(${js(testTitle(entry))}, async ({}, testInfo) => {`,
`\t\t\tconst entry: InventoryEntry = ${js(safe)};`,
'\t\t\tawait runEntry(entry, testInfo);',
'\t\t});',
);
}
out.push('\t});', '');
}
out.push('});', '');
return out.join('\n');
}
function atomicWrite(path: string, body: string): void {
const tmp = `${path}.tmp`;
writeFileSync(tmp, body, 'utf8');
renameSync(tmp, path);
}
function main(): void {
const opts = parseCli(process.argv.slice(2));
if (opts.help) {
printUsage();
return;
}
const inv = loadInventory(opts.inventory);
const meta = loadMeta(opts.inventory);
validate(inv, meta);
const groups = groupBySurface(inv.entries);
const body = generateSpec(inv, meta, groups);
atomicWrite(opts.output, body);
const testCount = inv.entries.length;
process.stdout.write(
`gen-render-specs: wrote ${opts.output}\n` +
` ${testCount} test() across ${groups.length} test.describe() ` +
`(${inv.deniedActions} denylisted)\n`,
);
}
main();

View File

@@ -0,0 +1,202 @@
// Live AX-tree probe for the claudeai.ts migration. Connects to the
// host's main-process Node inspector on :9229 (must be enabled via
// "Developer → Enable Main Process Debugger"), pulls the claude.ai
// AX tree, and reports what the page-object discrimination shapes
// will actually see.
//
// Read-only — no clicks, no state mutation.
//
// Run: cd tools/test-harness && npx tsx explore/probe-claudeai-ax.ts
import { InspectorClient } from '../src/lib/inspector.js';
import { axTreeToSnapshot, type RawElement } from './walker.js';
const INSPECTOR_PORT = 9229;
const ROW_MORE_OPTIONS_RE = /^More options for /;
const MENU_ITEM_ROLES = new Set([
'menuitem',
'menuitemradio',
'menuitemcheckbox',
]);
function landmarkTrail(el: RawElement): string {
const trail = el.ancestors
.filter((a) => a.role !== null)
.map((a) => (a.name ? `${a.role}[${a.name}]` : (a.role as string)));
return trail.join(' ') || '<no ancestors>';
}
function fmtElement(el: RawElement): string {
const name = el.accessibleName ?? '<no-name>';
const popup = el.hasPopup ?? '-';
return (
` • role=${el.computedRole} hasPopup=${popup} ` +
`name=${JSON.stringify(name).slice(0, 90)}\n` +
` landmarks: ${landmarkTrail(el)}`
);
}
async function main(): Promise<void> {
const inspector = await InspectorClient.connect(INSPECTOR_PORT);
try {
// What URL is the renderer on right now?
const url = await inspector.evalInRenderer<string>(
'claude.ai',
'(() => location.href)()',
);
process.stdout.write(`renderer URL: ${url}\n\n`);
const nodes = await inspector.getAccessibleTree('claude.ai');
process.stdout.write(`raw AX nodes: ${nodes.length}\n`);
const elements = axTreeToSnapshot(nodes);
process.stdout.write(
`interactive elements (post-filter): ${elements.length}\n\n`,
);
// Bucket by role for a quick overall shape.
const byRole = new Map<string, number>();
for (const el of elements) {
byRole.set(el.computedRole, (byRole.get(el.computedRole) ?? 0) + 1);
}
process.stdout.write('role histogram:\n');
for (const [role, n] of [...byRole.entries()].sort()) {
process.stdout.write(` ${role}: ${n}\n`);
}
process.stdout.write('\n');
// THE KEY QUESTION: do any buttons report hasPopup === 'menu'?
// If yes, the migration's discrimination shape is sound. If no,
// claude.ai exposes the popover trigger via a different AX
// signal and we need a different filter.
const buttonsWithPopup = elements.filter(
(el) => el.computedRole === 'button' && el.hasPopup !== null,
);
process.stdout.write(
`buttons with hasPopup set (any value): ${buttonsWithPopup.length}\n`,
);
const popupValues = new Map<string, number>();
for (const b of buttonsWithPopup) {
const v = b.hasPopup ?? '<null>';
popupValues.set(v, (popupValues.get(v) ?? 0) + 1);
}
for (const [v, n] of [...popupValues.entries()].sort()) {
process.stdout.write(` hasPopup="${v}": ${n}\n`);
}
process.stdout.write('\n');
// What findCompactPills() would return.
const compactPills = elements.filter(
(el) =>
el.computedRole === 'button' &&
el.hasPopup === 'menu' &&
el.accessibleName !== null &&
el.accessibleName.length > 0 &&
!ROW_MORE_OPTIONS_RE.test(el.accessibleName),
);
process.stdout.write(
`findCompactPills() would return ${compactPills.length} candidate(s):\n`,
);
for (const el of compactPills) process.stdout.write(`${fmtElement(el)}\n`);
process.stdout.write('\n');
// What the row-more-options filter is dropping.
const rowMore = elements.filter(
(el) =>
el.computedRole === 'button' &&
el.hasPopup === 'menu' &&
el.accessibleName !== null &&
ROW_MORE_OPTIONS_RE.test(el.accessibleName),
);
process.stdout.write(
`row-more-options filter dropped ${rowMore.length} button(s) ` +
`(showing first 5):\n`,
);
for (const el of rowMore.slice(0, 5)) {
process.stdout.write(`${fmtElement(el)}\n`);
}
process.stdout.write('\n');
// Top-level tabs: activateTab() looks for `role: 'button'` with
// accessibleName === 'Chat' | 'Cowork' | 'Code'. Probe each one.
process.stdout.write('top-level tab probe:\n');
for (const name of ['Chat', 'Cowork', 'Code']) {
const matches = elements.filter(
(el) =>
el.computedRole === 'button' && el.accessibleName === name,
);
process.stdout.write(` "${name}": ${matches.length} match(es)\n`);
for (const el of matches) {
process.stdout.write(
` landmarks: ${landmarkTrail(el)} hasPopup=${el.hasPopup ?? '-'}\n`,
);
}
}
process.stdout.write('\n');
// Open menu? Anything in MENU_ITEM_ROLES right now would mean a
// menu happens to be open at probe time — useful context for
// callers reading the output.
const items = elements.filter((el) =>
MENU_ITEM_ROLES.has(el.computedRole),
);
process.stdout.write(
`menuitem* elements currently in tree: ${items.length}` +
(items.length > 0 ? ' (a menu is open — surprise context)' : '') +
'\n\n',
);
// Diagnostic: is `properties[]` even being returned? Dump the
// raw shape of the first button node and any node that has a
// non-empty properties array, so we can tell whether
// (a) Chromium isn't surfacing aria-haspopup, or
// (b) properties[] is just absent from the response.
const firstButton = nodes.find((n) => n.role?.value === 'button');
if (firstButton) {
process.stdout.write('first raw button AxNode (full JSON):\n');
process.stdout.write(`${JSON.stringify(firstButton, null, 2)}\n\n`);
}
const nodesWithProps = nodes.filter(
(n) => Array.isArray(n.properties) && n.properties.length > 0,
);
process.stdout.write(
`raw nodes with non-empty properties[]: ${nodesWithProps.length}\n`,
);
// Histogram of property names actually present.
const propNames = new Map<string, number>();
for (const n of nodesWithProps) {
const props = n.properties as { name?: string }[];
for (const p of props) {
if (typeof p.name === 'string') {
propNames.set(p.name, (propNames.get(p.name) ?? 0) + 1);
}
}
}
for (const [name, n] of [...propNames.entries()].sort()) {
process.stdout.write(` property "${name}": ${n}\n`);
}
process.stdout.write('\n');
// Spot-check the model picker if visible — it should be the
// canonical "menu trigger" on every surface.
const modelLikely = elements.filter(
(el) =>
el.accessibleName !== null &&
/^(Opus|Sonnet|Haiku|Claude)\b/i.test(el.accessibleName),
);
process.stdout.write(
`model-picker-like elements (name starts with Opus/Sonnet/Haiku/Claude): ` +
`${modelLikely.length}\n`,
);
for (const el of modelLikely.slice(0, 5)) {
process.stdout.write(`${fmtElement(el)}\n`);
}
} finally {
inspector.close();
}
}
main().catch((err) => {
process.stderr.write(`probe failed: ${err}\n`);
process.exit(1);
});

View File

@@ -0,0 +1,276 @@
// Renderer-state capture for the explore CLI.
//
// Why a separate module: the snapshot shape is the contract diff.ts
// reads against. Keeping the capture here (rather than inline in the
// dispatcher) means a future format bump only touches two files and
// the schema lives next to its sole producer.
//
// All discovery is by structural shape — never by minified Tailwind
// class names. We anchor on:
// - df-pills: button.df-pill[aria-label] (3 expected: Chat/Cowork/Code)
// - compact pills: button[aria-haspopup="menu"] containing
// span.truncate.max-w-[Npx] (env pill, Select-folder pill, …)
// - aria-labeled buttons: any <button[aria-label]> for general drift
// visibility (sidebar "more" buttons, header actions, modals).
// - open menu: the role=menu currently in the DOM, plus its items.
// - modals: role=dialog elements with aria-label/aria-labelledby.
//
// All renderer evals run in a single round-trip to keep snapshots
// deterministic — async work between probes can shift the DOM.
import type { InspectorClient } from '../src/lib/inspector.js';
export interface DfPill {
ariaLabel: string | null;
text: string;
visible: boolean;
}
export interface CompactPillSnap {
ariaLabel: string | null;
text: string;
maxW: string;
expanded: boolean;
}
export interface AriaButton {
ariaLabel: string;
text: string;
expanded: boolean | null;
hasPopup: string | null;
visible: boolean;
}
export interface MenuItem {
role: string;
text: string;
ariaChecked: string | null;
disabled: boolean;
}
export interface OpenMenu {
ariaLabelledBy: string | null;
ariaLabel: string | null;
items: MenuItem[];
}
export interface ModalSnap {
ariaLabel: string | null;
ariaLabelledBy: string | null;
headingText: string | null;
buttonLabels: string[];
}
export interface PageState {
url: string;
title: string;
readyState: string;
}
export interface Snapshot {
capturedAt: string;
claudeAiUrl: string;
appVersion: string | null;
pageState: PageState;
dfPills: DfPill[];
compactPills: CompactPillSnap[];
ariaLabeledButtons: AriaButton[];
openMenu: OpenMenu | null;
modals: ModalSnap[];
}
// Capture the renderer DOM into the canonical snapshot shape.
// `claudeAiUrl` is recorded separately from pageState.url because the
// pageState reflects the moment of capture and is useful for diff
// triage; the top-level url anchors which webContents we hit.
export async function capture(client: InspectorClient): Promise<Snapshot> {
const target = await pickClaudeAiWebContents(client);
const appVersion = await readAppVersion(client);
const dom = await client.evalInRenderer<{
pageState: PageState;
dfPills: DfPill[];
compactPills: CompactPillSnap[];
ariaLabeledButtons: AriaButton[];
openMenu: OpenMenu | null;
modals: ModalSnap[];
}>('claude.ai', RENDERER_CAPTURE_BODY);
return {
capturedAt: new Date().toISOString(),
claudeAiUrl: target,
appVersion,
pageState: dom.pageState,
dfPills: dom.dfPills,
compactPills: dom.compactPills,
ariaLabeledButtons: dom.ariaLabeledButtons,
openMenu: dom.openMenu,
modals: dom.modals,
};
}
// Just the pills slice — used by `explore pills`. Reuses the same eval
// body to avoid drift between subcommands.
export async function capturePills(
client: InspectorClient,
): Promise<{
dfPills: DfPill[];
compactPills: CompactPillSnap[];
pageState: PageState;
}> {
const dom = await client.evalInRenderer<{
pageState: PageState;
dfPills: DfPill[];
compactPills: CompactPillSnap[];
ariaLabeledButtons: AriaButton[];
openMenu: OpenMenu | null;
modals: ModalSnap[];
}>('claude.ai', RENDERER_CAPTURE_BODY);
return {
dfPills: dom.dfPills,
compactPills: dom.compactPills,
pageState: dom.pageState,
};
}
// Just the open menu — used by `explore menu`.
export async function captureOpenMenu(
client: InspectorClient,
): Promise<OpenMenu | null> {
const dom = await client.evalInRenderer<{ openMenu: OpenMenu | null }>(
'claude.ai',
`(() => { ${OPEN_MENU_FN} return { openMenu: openMenu() }; })()`,
);
return dom.openMenu;
}
async function pickClaudeAiWebContents(
client: InspectorClient,
): Promise<string> {
const list = await client.evalInMain<Array<{ url: string }>>(`
const { webContents } = process.mainModule.require('electron');
return webContents.getAllWebContents().map(w => ({ url: w.getURL() }));
`);
const target = list.find((w) => w.url.includes('claude.ai'));
if (!target) {
throw new Error(
'snapshot: no claude.ai webContents — open the app to a ' +
'logged-in state first',
);
}
return target.url;
}
// app.getVersion() is the cleanest source of truth — same value the
// app.asar serves at runtime. Returns null if the call shape ever
// changes upstream rather than failing the whole snapshot.
async function readAppVersion(
client: InspectorClient,
): Promise<string | null> {
try {
return await client.evalInMain<string>(`
const { app } = process.mainModule.require('electron');
return app.getVersion();
`);
} catch {
return null;
}
}
// Single shared renderer-eval body. Definitions are inlined as IIFEs so
// the whole capture is one round-trip. Truncation limits (text 200,
// list 200) are wide enough for current claude.ai but bounded so a
// future infinite-scroll regression doesn't blow up the JSON file.
const OPEN_MENU_FN = `
function openMenu() {
const menu = document.querySelector('[role=menu][data-open]')
|| document.querySelector('[role=menu]');
if (!menu) return null;
const items = Array.from(menu.querySelectorAll(
'[role=menuitem], [role=menuitemradio], [role=menuitemcheckbox]'
)).slice(0, 200).map(el => ({
role: el.getAttribute('role') || '',
text: (el.textContent || '').trim().slice(0, 200),
ariaChecked: el.getAttribute('aria-checked'),
disabled: el.hasAttribute('data-disabled')
|| el.getAttribute('aria-disabled') === 'true',
}));
return {
ariaLabelledBy: menu.getAttribute('aria-labelledby'),
ariaLabel: menu.getAttribute('aria-label'),
items,
};
}
`;
const RENDERER_CAPTURE_BODY = `
(() => {
${OPEN_MENU_FN}
const buttons = Array.from(document.querySelectorAll('button'));
const dfPills = buttons
.filter(b => /\\bdf-pill\\b/.test(b.className))
.map(b => ({
ariaLabel: b.getAttribute('aria-label'),
text: (b.textContent || '').trim().slice(0, 200),
visible: !!b.getClientRects().length,
}));
const compactPills = buttons.flatMap(b => {
if (b.getAttribute('aria-haspopup') !== 'menu') return [];
const span = b.querySelector('span.truncate');
if (!span) return [];
const m = span.className.match(/max-w-\\[[^\\]]+\\]/);
if (!m) return [];
return [{
ariaLabel: b.getAttribute('aria-label'),
text: (span.textContent || '').trim().slice(0, 200),
maxW: m[0],
expanded: b.getAttribute('aria-expanded') === 'true',
}];
});
const ariaLabeledButtons = buttons
.filter(b => b.hasAttribute('aria-label'))
.slice(0, 200)
.map(b => ({
ariaLabel: b.getAttribute('aria-label') || '',
text: (b.textContent || '').trim().slice(0, 200),
expanded: b.hasAttribute('aria-expanded')
? b.getAttribute('aria-expanded') === 'true'
: null,
hasPopup: b.getAttribute('aria-haspopup'),
visible: !!b.getClientRects().length,
}));
const modals = Array.from(
document.querySelectorAll('[role=dialog]')
).slice(0, 20).map(d => {
const heading = d.querySelector(
'h1, h2, h3, [role=heading]'
);
const btnLabels = Array.from(d.querySelectorAll('button'))
.slice(0, 50)
.map(b => {
const al = b.getAttribute('aria-label');
if (al) return al;
return (b.textContent || '').trim().slice(0, 80);
})
.filter(s => s.length > 0);
return {
ariaLabel: d.getAttribute('aria-label'),
ariaLabelledBy: d.getAttribute('aria-labelledby'),
headingText: heading
? (heading.textContent || '').trim().slice(0, 200)
: null,
buttonLabels: btnLabels,
};
});
return {
pageState: {
url: location.href,
title: document.title,
readyState: document.readyState,
},
dfPills,
compactPills,
ariaLabeledButtons,
openMenu: openMenu(),
modals,
};
})()
`;

View File

@@ -0,0 +1,240 @@
// Drive a v7 walk inside the test harness's launch-with-isolation
// path so the run lives in a per-launch tmpdir (auth seeded from the
// host config) rather than the running host app's own profile.
//
// Why a separate driver instead of `explore walk`: the standalone CLI
// connects to whatever Node inspector is already on :9229 — i.e. the
// running host Claude Desktop. That path mutates the host profile
// (visited surfaces, navigation history, route changes) and races
// with the human at the keyboard. The launchClaude path here mirrors
// what H05 / U01 do: kill any running host instance, copy auth into
// a tmpdir, spawn a fresh Electron with isolated XDG_CONFIG_HOME,
// attach the inspector via SIGUSR1, and tear everything down on
// exit.
//
// Usage (matches `explore walk` flag set):
// npx tsx explore/walk-isolated.ts --verbose --max-elements 2000
//
// Flags:
// --max-elements N global cap (default 1000)
// --max-drills-per-surface N per-surface drilling fan-out cap (default 50)
// --checkpoint-every N write inventory every N entries (default 100)
// --output PATH inventory output (default docs/testing/
// ui-inventory.json)
// --allowlist PATH JSON file with `exemptions: string[]`
// --no-seed don't copy host auth — fresh sign-in
// required (rare; default seeds from host)
// --verbose walker chatter to stderr
import {
existsSync,
mkdirSync,
readFileSync,
renameSync,
writeFileSync,
} from 'node:fs';
import { dirname, resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
import { launchClaude } from '../src/lib/electron.js';
import { createIsolation } from '../src/lib/isolation.js';
import { walkRenderer, WALKER_VERSION } from './walker.js';
import type { Inventory } from './walker.js';
const TESTING_DIR = resolve(
dirname(fileURLToPath(import.meta.url)),
'..',
'..',
'..',
'docs',
'testing',
);
const INVENTORY_PATH = resolve(TESTING_DIR, 'ui-inventory.json');
const INVENTORY_META_PATH = resolve(TESTING_DIR, 'ui-inventory.meta.json');
const INVENTORY_TMP_SUFFIX = '.tmp';
interface Options {
maxElements: number;
maxDrillsPerSurface: number;
checkpointEvery: number;
allowlist: string | null;
output: string;
verbose: boolean;
seed: boolean;
help: boolean;
}
function parseArgs(args: string[]): Options {
const opts: Options = {
maxElements: 1000,
maxDrillsPerSurface: 50,
checkpointEvery: 100,
allowlist: null,
output: INVENTORY_PATH,
verbose: false,
seed: true,
help: false,
};
for (let i = 0; i < args.length; i += 1) {
const a = args[i]!;
if (a === '-h' || a === '--help') opts.help = true;
else if (a === '--verbose') opts.verbose = true;
else if (a === '--no-seed') opts.seed = false;
else if (a === '--max-elements') {
const n = Number(args[++i]);
if (!Number.isFinite(n) || n < 0) die('--max-elements N (N≥0)');
opts.maxElements = n;
} else if (a === '--max-drills-per-surface') {
const n = Number(args[++i]);
if (!Number.isFinite(n) || n < 0) die('--max-drills-per-surface N');
opts.maxDrillsPerSurface = n;
} else if (a === '--checkpoint-every') {
const n = Number(args[++i]);
if (!Number.isInteger(n) || n < 0) die('--checkpoint-every N');
opts.checkpointEvery = n;
} else if (a === '--allowlist') {
const p = args[++i];
if (!p) die('--allowlist PATH');
opts.allowlist = p;
} else if (a === '--output') {
const p = args[++i];
if (!p) die('--output PATH');
opts.output = resolve(p);
} else {
die(`unknown flag: ${a}`);
}
}
return opts;
}
function die(msg: string): never {
process.stderr.write(`walk-isolated: ${msg}\n`);
process.exit(1);
}
function printUsage(): void {
process.stdout.write(
[
'usage: npx tsx explore/walk-isolated.ts [flags]',
'',
'flags:',
' --max-elements N global cap (default 1000)',
' --max-drills-per-surface N drilling fan-out cap (default 50)',
' --checkpoint-every N partial-write cadence (default 100; 0 disables)',
' --output PATH inventory output path',
' --allowlist PATH JSON { exemptions: string[] }',
' --no-seed skip host-config auth seeding',
' --verbose walker chatter on stderr',
'',
].join('\n'),
);
}
async function main(): Promise<void> {
const opts = parseArgs(process.argv.slice(2));
if (opts.help) {
printUsage();
return;
}
let allowlist: string[] = [];
if (opts.allowlist) {
const raw = readFileSync(opts.allowlist, 'utf8');
const parsed = JSON.parse(raw) as { exemptions?: string[] };
allowlist = parsed.exemptions ?? [];
}
const outDir = dirname(opts.output);
if (!existsSync(outDir)) mkdirSync(outDir, { recursive: true });
const metaPath =
opts.output === INVENTORY_PATH
? INVENTORY_META_PATH
: opts.output.replace(/\.json$/, '.meta.json');
const writeCheckpoint = (inventory: Inventory, isPartial: boolean): void => {
const invTmp = `${opts.output}${INVENTORY_TMP_SUFFIX}`;
writeFileSync(invTmp, JSON.stringify(inventory, null, 2) + '\n', 'utf8');
renameSync(invTmp, opts.output);
const meta = {
capturedAt: inventory.capturedAt,
appVersion: inventory.appVersion,
walkerVersion: WALKER_VERSION,
startUrl: inventory.startUrl,
totalElements: inventory.totalElements,
deniedActions: inventory.deniedActions,
partial: isPartial,
isolation: 'launchClaude (test-harness path)',
seededFromHost: opts.seed,
allowlistEntries: allowlist,
};
const metaTmp = `${metaPath}${INVENTORY_TMP_SUFFIX}`;
writeFileSync(metaTmp, JSON.stringify(meta, null, 2) + '\n', 'utf8');
renameSync(metaTmp, metaPath);
};
process.stderr.write(
`walk-isolated: creating isolation (seedFromHost=${opts.seed})\n`,
);
const isolation = await createIsolation({ seedFromHost: opts.seed });
let app: Awaited<ReturnType<typeof launchClaude>> | null = null;
try {
process.stderr.write('walk-isolated: spawning Claude Desktop\n');
app = await launchClaude({ isolation });
process.stderr.write(
'walk-isolated: waiting for claude.ai webContents (90s budget)\n',
);
const { inspector, claudeAiUrl } = await app.waitForReady('claudeAi');
if (!claudeAiUrl) {
throw new Error(
'claude.ai webContents never loaded — host likely not signed in. ' +
'Open Claude Desktop, sign in, fully close, and re-run.',
);
}
process.stderr.write(`walk-isolated: at ${claudeAiUrl}\n`);
const inventory = await walkRenderer(inspector, {
maxElements: opts.maxElements,
maxDrillsPerSurface: opts.maxDrillsPerSurface,
allowlist,
verbose: opts.verbose,
checkpointEvery: opts.checkpointEvery,
checkpointWriter:
opts.checkpointEvery > 0
? (inv) => writeCheckpoint(inv, true)
: undefined,
});
writeCheckpoint(inventory, false);
process.stdout.write(
`wrote ${opts.output} (${inventory.totalElements} entries, ` +
`${inventory.deniedActions} denylisted)\n`,
);
process.stdout.write(`wrote ${metaPath}\n`);
} finally {
if (app) {
try {
await app.close();
} catch (err) {
process.stderr.write(
`walk-isolated: app.close() failed: ${
err instanceof Error ? err.message : String(err)
}\n`,
);
}
}
try {
await isolation.cleanup();
} catch (err) {
process.stderr.write(
`walk-isolated: isolation.cleanup() failed: ${
err instanceof Error ? err.message : String(err)
}\n`,
);
}
}
}
main().catch((err) => {
const msg = err instanceof Error ? err.message : String(err);
process.stderr.write(`walk-isolated: ${msg}\n`);
process.exit(2);
});

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,468 @@
// Grounding probe — dumps Claude Desktop runtime state that backs the
// load-bearing claims in docs/testing/cases/. Output is keyed by
// test-ID so the next grounding sweep can diff captures across
// upstream versions.
//
// Two modes:
// - attach (default): connect to an already-running app on port 9229
// (manual `--inspect=9229` run, or a launchClaude() instance that
// called attachInspector()).
// - --launch: spin up a fresh isolated instance via launchClaude(),
// capture, tear down. Self-contained — usable in CI.
//
// Mostly read-only; --include-synthetic enables short-lived state
// changes (powerSaveBlocker start+stop) to close API-only gaps.
//
// Captures, keyed by test ID:
// T01 app metadata, webContents count
// T03 SNI / tray registration via DBus (KDE StatusNotifierWatcher)
// T06 globalShortcut.isRegistered() for known accelerators
// T09 app.getLoginItemSettings()
// T22 AX fingerprint (PR toolbar — open the surface before probing)
// T23 Notification.isSupported()
// T24 IPC channels matching /external|editor|openIn/i
// T26 AX fingerprint (Routines page — open before probing)
// T31 AX fingerprint (side chat — open before probing)
// T32 AX fingerprint (slash menu — type "/" before probing)
// T38 IPC channels matching /external|editor|openIn/i (editor handoff)
// S18 safeStorage.isEncryptionAvailable() + backend
// S20 powerSaveBlocker (gated by --include-synthetic)
// S22 process.platform (Computer Use gate)
// S25 safeStorage (cowork trusted-device token)
// S26 autoUpdater.getFeedURL() — empirical answer to the structural-
// open claim that static analysis couldn't resolve
//
// Usage:
// cd tools/test-harness
// npx tsx grounding-probe.ts # attach :9229
// npx tsx grounding-probe.ts --launch # self-contained
// npx tsx grounding-probe.ts --launch --include-synthetic
// npx tsx grounding-probe.ts --out ../../docs/testing/cases-grounding-runtime.json
// npx tsx grounding-probe.ts --port 9229 --out path/to/file.json
//
// Extending: add a section in capture() with a `client.evalInMain`
// dump targeting whatever runtime state your new test cares about,
// then map the result into `tests[<id>]`.
import { writeFileSync } from 'node:fs';
import { InspectorClient } from './src/lib/inspector.js';
import { launchClaude } from './src/lib/electron.js';
// dbus-next is loaded lazily inside captureSni() — importing here would
// pull in a session-bus connection on environments without one (CI
// containers, sshfs, etc.) and break the probe before it ever runs.
// Accelerators we expect to be registered on Linux. T06 = Quick Entry
// default. S31/S32 — fullscreen + cmd-K dispatch. Extend per case docs.
const KNOWN_ACCELERATORS = [
'Alt+Space',
'Ctrl+Alt+Space',
'CommandOrControl+Shift+L',
];
interface AxFingerprintNode {
role: string;
name: string;
hasPopup: boolean;
}
interface GroundingCapture {
capturedAt: string;
appVersion: string;
appPath: string;
isPackaged: boolean;
platform: string;
// Cross-test corpus — useful as a denormalized source the per-test
// entries reference by index/key. Keep these flat so jq queries
// don't need to walk a nested tree.
ipcInvokeChannels: string[];
ipcOnChannels: string[];
webContents: Array<{ id: number; url: string; type: string }>;
// Reduced AX tree of the current claude.ai webContents, shared by
// every test entry that names a renderer-side surface. Stored once
// at the top level rather than copied per-test — diff stability
// matters more than per-test isolation here.
axFingerprint: AxFingerprintNode[];
// Per-test bag — extend as new probes land. Each entry is the
// runtime state the test's load-bearing claim depends on, in a
// shape that's easy to diff across captures. Renderer-side tests
// reference $.axFingerprint via { axFingerprintRef: true }.
tests: Record<string, unknown>;
// Probe-level diagnostics — what we tried and couldn't capture.
// Surfaced so the grounding sweep can flag uncovered surfaces.
gaps: string[];
}
interface CaptureOptions {
includeSynthetic: boolean;
}
async function capture(
client: InspectorClient,
opts: CaptureOptions,
): Promise<GroundingCapture> {
const gaps: string[] = [];
// App metadata — every test references at least one of these.
const appMeta = await client.evalInMain<{
appVersion: string;
appPath: string;
isPackaged: boolean;
appReady: boolean;
platform: string;
}>(`
const { app } = process.mainModule.require('electron');
return {
appVersion: app.getVersion(),
appPath: app.getAppPath(),
isPackaged: app.isPackaged,
appReady: app.isReady(),
platform: process.platform,
};
`);
// IPC handler registry. Every claude.web_* channel registers via
// ipcMain.handle() (invoke side) or ipcMain.on() (fire-and-forget).
// Private API — surfaces shift across Electron versions; tolerate
// both shapes.
const ipc = await client.evalInMain<{ invoke: string[]; on: string[] }>(`
const { ipcMain } = process.mainModule.require('electron');
const invoke = ipcMain._invokeHandlers
? Array.from(ipcMain._invokeHandlers.keys())
: [];
const on = ipcMain.eventNames ? ipcMain.eventNames().map(String) : [];
return { invoke, on };
`);
// WebContents inventory — proves which BrowserViews / BrowserWindows
// exist at probe time. Note: BrowserWindow.getAllWindows() returns
// 0 because frame-fix-wrapper substitutes the class (see
// inspector.ts header comment) — webContents registry stays intact.
const webContents = await client.evalInMain<
Array<{ id: number; url: string; type: string }>
>(`
const { webContents } = process.mainModule.require('electron');
return webContents.getAllWebContents().map(w => ({
id: w.id,
url: w.getURL(),
type: w.getType ? w.getType() : 'unknown',
}));
`);
// Global shortcuts — T06, S31/S32 reference these. isRegistered()
// is the canonical runtime probe; matches the case-doc claim about
// what's bound at startup.
const accelerators = await client.evalInMain<
Array<{ accelerator: string; registered: boolean }>
>(`
const { globalShortcut } = process.mainModule.require('electron');
const list = ${JSON.stringify(KNOWN_ACCELERATORS)};
return list.map(a => ({
accelerator: a,
registered: globalShortcut.isRegistered(a),
}));
`);
// Autostart resolution — T09. On Linux Electron's openAtLogin is a
// documented no-op; our wrapper installs an XDG Autostart shim
// (frame-fix-wrapper.js:376). The empirical check confirms which
// path is active.
const loginItems = await client.evalInMain<{
openAtLogin: boolean;
wasOpenedAtLogin?: boolean;
executableWillLaunchAtLogin?: boolean;
}>(`
const { app } = process.mainModule.require('electron');
return app.getLoginItemSettings();
`);
// safeStorage — S18 (env-config encryption) + S25 (cowork trusted-
// device token). Linux backend is libsecret; availability gates
// whether tokens persist or stall.
const safeStorage = await client.evalInMain<{
available: boolean;
backend: string;
}>(`
const { safeStorage } = process.mainModule.require('electron');
let backend = 'unknown';
try {
if (safeStorage.getSelectedStorageBackend) {
backend = safeStorage.getSelectedStorageBackend();
}
} catch (_) { /* older Electron — backend not exposed */ }
return {
available: safeStorage.isEncryptionAvailable(),
backend,
};
`);
// autoUpdater feedURL — S26. The case doc claims the gate is open
// by construction (lii() returns true on Linux when packaged).
// Accidental coverage from Electron's Linux autoUpdater being
// unimplemented saves us from real download attempts. This probe
// puts that on the record empirically.
const autoUpdater = await client.evalInMain<{
feedURL: string | null;
feedURLError: string | null;
}>(`
const { autoUpdater } = process.mainModule.require('electron');
let feedURL = null, feedURLError = null;
try {
feedURL = autoUpdater.getFeedURL ? autoUpdater.getFeedURL() : null;
} catch (e) {
feedURLError = String(e && e.message);
}
return { feedURL, feedURLError };
`);
// Tray — T03. We can't enumerate Tray instances via public API,
// but we can confirm Notification support is alive (T23 prerequisite).
const notifications = await client.evalInMain<{ supported: boolean }>(`
const { Notification } = process.mainModule.require('electron');
return { supported: Notification.isSupported() };
`);
// Powermonitor / suspend inhibit — S20. powerSaveBlocker has no
// public enumeration API. Synthetic probe (gated behind
// --include-synthetic) starts a blocker, reads isStarted, stops
// immediately. Brief inhibit (~ms) is harmless; what we get back
// is empirical proof the API path is alive on this host. Doesn't
// verify the case-doc claim that `keepAwakeEnabled` setting toggles
// trigger this — that requires correlating settings IO with the
// `PhA` Set at index.js:241897, which depends on minified-name
// stability and is left to the next sweep.
let powerSaveBlocker: {
apiAvailable: boolean;
startWorks: boolean;
idType: string;
probeError: string | null;
} | null = null;
if (opts.includeSynthetic) {
powerSaveBlocker = await client.evalInMain(`
const { powerSaveBlocker } = process.mainModule.require('electron');
let id = null, started = false, probeError = null;
try {
id = powerSaveBlocker.start('prevent-app-suspension');
started = powerSaveBlocker.isStarted(id);
} catch (e) {
probeError = String(e && e.message);
} finally {
if (id !== null) {
try { powerSaveBlocker.stop(id); } catch (_) {}
}
}
return {
apiAvailable: true,
startWorks: started,
idType: typeof id,
probeError,
};
`);
} else {
gaps.push(
'S20: powerSaveBlocker not probed (skip-synthetic). ' +
'Re-run with --include-synthetic to confirm API path.',
);
}
// Editor handoff scheme registry — T24/T38. Static case anchor
// (`Mtt` at index.js:463902) names the registry; variable is
// minified, so we identify by IPC handler name pattern instead.
// The case doc claims schemes vscode/cursor/zed/windsurf are wired
// up on Linux (xcode is darwin-only). The IPC channel that calls
// `shell.openExternal('<scheme>://file/<encoded-path>:<line>')`
// will be one of these matches.
const editorIpcChannels = [
...ipc.invoke.filter((c) => /external|editor|openIn/i.test(c)),
...ipc.on.filter((c) => /external|editor|openIn/i.test(c)),
];
// Renderer AX fingerprint — T22/T26/T31/T32. `getAccessibleTree`
// snapshots whatever's *currently on screen*. To anchor surfaces
// inside modals/popups (preset list, slash menu, side chat, PR
// toolbar), open the surface in the running app before probe time.
// Reduced form (role+name+hasPopup) keeps the output grep-able and
// avoids re-shipping ui-inventory.json's full schema.
const claudeAi = webContents.find((w) => w.url.includes('claude.ai'));
let axFingerprint: AxFingerprintNode[] = [];
if (claudeAi) {
try {
const tree = await client.getAccessibleTree('claude.ai');
axFingerprint = tree
.filter((n) => !n.ignored && n.role && n.name)
.map((n) => ({
role: n.role!.value,
name: n.name!.value,
hasPopup: !!n.properties?.find((p) => p.name === 'haspopup'),
}))
.filter((n) => n.name.length > 0);
} catch (e) {
gaps.push(
`renderer-ax: getAccessibleTree threw: ${e instanceof Error ? e.message : String(e)}`,
);
}
} else {
gaps.push(
'renderer-ax: no claude.ai webContents at probe time. ' +
'Sign in to the app before re-running to capture renderer state.',
);
}
// Tray / SNI registration — T03. Linux tray icons register against
// org.kde.StatusNotifierWatcher (KDE protocol used by GNOME's
// AppIndicator extension too). We can attribute an SNI item to the
// app's pid via `findItemByPid`. Lazily imported because dbus-next
// connects on first call to getSessionBus(), and we want
// non-DBus environments to still get a partial probe rather than
// hard-fail.
const ourPid = await client.evalInMain<number>('return process.pid;');
let sni: {
ourPid: number;
registeredItem: { service: string; objectPath: string } | null;
probeError: string | null;
} = { ourPid, registeredItem: null, probeError: null };
try {
const sniLib = await import('./src/lib/sni.js');
const dbusLib = await import('./src/lib/dbus.js');
try {
sni.registeredItem = await sniLib.findItemByPid(ourPid);
} finally {
await dbusLib.disconnectBus();
}
} catch (e) {
sni.probeError = e instanceof Error ? e.message : String(e);
}
// T22 PR toolbar / T31 side chat / T32 slash menu — these surfaces
// are now captured if the user has the relevant view open at probe
// time (see `axFingerprint` above). Empty fingerprint at idle is
// expected; flag here only if the renderer was reachable but the
// captured tree was empty (which would suggest the AX walker hit
// a permission gate or was disabled).
if (claudeAi && axFingerprint.length === 0) {
gaps.push(
'renderer-ax: claude.ai webContents present but AX tree empty. ' +
'Either Accessibility was not enabled or the page is mid-load.',
);
}
gaps.push(
'T39 /desktop: lives in the upstream `claude` CLI binary, not the ' +
'Electron asar — not reachable from this probe.',
);
return {
capturedAt: new Date().toISOString(),
appVersion: appMeta.appVersion,
appPath: appMeta.appPath,
isPackaged: appMeta.isPackaged,
platform: appMeta.platform,
ipcInvokeChannels: ipc.invoke,
ipcOnChannels: ipc.on,
webContents,
axFingerprint,
tests: {
T01: { appReady: appMeta.appReady, webContentsCount: webContents.length },
T03: sni,
T06: { accelerators },
T09: loginItems,
T22: { axFingerprintRef: true, count: axFingerprint.length },
T23: notifications,
T24: { editorIpcChannels },
T26: { axFingerprintRef: true, count: axFingerprint.length },
T31: { axFingerprintRef: true, count: axFingerprint.length },
T32: { axFingerprintRef: true, count: axFingerprint.length },
T38: { editorIpcChannels },
S18: safeStorage,
S20: powerSaveBlocker,
S22: {
platform: appMeta.platform,
expectedDisabledOnLinux: appMeta.platform === 'linux',
},
S25: safeStorage,
S26: {
...autoUpdater,
isPackaged: appMeta.isPackaged,
platform: appMeta.platform,
note: 'Gate is structurally open; saved by Electron autoUpdater being unimplemented on Linux.',
},
},
gaps,
};
}
interface ParsedArgs {
port: number;
out: string;
launch: boolean;
includeSynthetic: boolean;
}
function parseArgs(argv: string[]): ParsedArgs {
const flags = new Set<string>();
const args = new Map<string, string>();
for (let i = 2; i < argv.length; i++) {
const tok = argv[i];
if (!tok || !tok.startsWith('--')) continue;
const key = tok.replace(/^--/, '');
const next = argv[i + 1];
if (next && !next.startsWith('--')) {
args.set(key, next);
i++;
} else {
flags.add(key);
}
}
return {
port: Number(args.get('port') ?? 9229),
out: args.get('out') ?? '/tmp/grounding-probe.json',
launch: flags.has('launch'),
includeSynthetic: flags.has('include-synthetic'),
};
}
async function main() {
const parsed = parseArgs(process.argv);
const { out, launch, includeSynthetic } = parsed;
let client: InspectorClient;
let cleanup: () => Promise<void>;
if (launch) {
// Self-contained: fresh isolation per run, tear down on exit.
// 'mainVisible' is the lowest level that gives us the inspector
// without waiting on claude.ai network load. Sufficient for
// every probe in capture() — none touch renderer DOM.
const app = await launchClaude();
const ready = await app.waitForReady('mainVisible');
client = ready.inspector;
cleanup = async () => {
client.close();
await app.close();
};
} else {
client = await InspectorClient.connect(parsed.port);
cleanup = async () => {
client.close();
};
}
try {
const result = await capture(client, { includeSynthetic });
writeFileSync(out, JSON.stringify(result, null, 2));
console.log(
`grounding-probe: wrote ${out} ` +
`(${result.ipcInvokeChannels.length} invoke channels, ` +
`${result.webContents.length} webContents, ` +
`${result.axFingerprint.length} ax nodes, ` +
`${result.gaps.length} gaps` +
`${launch ? ', --launch' : ''}` +
`${includeSynthetic ? ', synthetic' : ''})`,
);
} finally {
await cleanup();
}
}
main().catch((err) => {
console.error('grounding-probe failed:', err);
process.exit(1);
});

View File

@@ -0,0 +1,108 @@
#!/usr/bin/env bash
# sweep.sh — run a test sweep for a row.
#
# Usage:
# ROW=KDE-W ./orchestrator/sweep.sh
# CLAUDE_DESKTOP_LAUNCHER=/usr/bin/claude-desktop ROW=KDE-W ./orchestrator/sweep.sh
#
# Output bundle layout:
# results/results-${ROW}-${DATE}/
# ├── junit.xml
# ├── html/ (Playwright HTML report)
# └── test-output/ (per-test attachments)
set -uo pipefail
script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
readonly script_dir
harness_dir="$(dirname "$script_dir")"
readonly harness_dir
readonly row="${ROW:-KDE-W}"
date_str="$(date -u +%Y%m%dT%H%M%SZ)"
readonly date_str
readonly bundle_id="results-${row}-${date_str}"
readonly results_root="${OUTPUT_DIR:-${harness_dir}/results}"
readonly bundle_dir="${results_root}/${bundle_id}"
mkdir -p "$bundle_dir"
cd "$harness_dir" || exit 1
# Backend banner. CLAUDE_HARNESS_USE_WAYLAND=1 flips every runner from
# the default X11/XWayland backend to native Wayland — see the
# "Environment variables" table in tools/test-harness/README.md.
if [[ "${CLAUDE_HARNESS_USE_WAYLAND:-}" == '1' ]]; then
printf 'sweep: native Wayland backend (CLAUDE_HARNESS_USE_WAYLAND=1)\n' >&2
fi
# Fast-fail prereq checks — only matter when the sweep includes
# Quick Entry runners (S31, future S29/S30/S32/S34/S35/S37 +
# T06 / QE-* additions). Skip with QE_PREREQ_CHECK=0 if running
# a sweep that excludes those.
if [[ "${QE_PREREQ_CHECK:-1}" == "1" ]]; then
if ! command -v ydotool >/dev/null 2>&1; then
printf 'sweep: ydotool not on PATH — Quick Entry runners will skip.\n' >&2
printf ' install: dnf install ydotool / apt install ydotool\n' >&2
printf ' to suppress this check: QE_PREREQ_CHECK=0\n' >&2
fi
socket="${YDOTOOL_SOCKET:-/tmp/.ydotool_socket}"
if [[ ! -S "$socket" ]]; then
printf 'sweep: ydotoold socket missing at %s — daemon not running.\n' \
"$socket" >&2
printf ' start: sudo systemctl start ydotool.service\n' >&2
printf ' see tools/test-harness/README.md "Quick Entry runners" for one-time setup\n' >&2
fi
fi
ROW="$row" \
RESULTS_DIR="$bundle_dir" \
npx playwright test
rc=$?
# Bundle into tar.zst for orchestrator pickup. Best-effort — keep the
# uncompressed dir even if zstd is unavailable.
if command -v zstd >/dev/null 2>&1; then
tar --zstd -cf "${results_root}/${bundle_id}.tar.zst" \
-C "$results_root" "$bundle_id" 2>/dev/null \
&& printf 'bundle: %s/%s.tar.zst\n' "$results_root" "$bundle_id"
fi
printf 'row=%s exit=%d dir=%s\n' "$row" "$rc" "$bundle_dir"
# Quick summary if junit.xml landed. Prefer Node so we sum across all
# <testsuite> elements (grep+head only saw the first suite, undercounting
# multi-suite reports). Fall back to the legacy grep path when node isn't
# on PATH so the harness stays usable on minimal images.
if [[ -f "${bundle_dir}/junit.xml" ]]; then
if command -v node >/dev/null 2>&1; then
read -r tests failures errors skipped \
< <(node -e "$(cat <<'EOF'
const fs = require('fs');
const xml = fs.readFileSync(process.argv[1], 'utf8');
const sumAttr = (a) => Array.from(
xml.matchAll(new RegExp(`<testsuite[^>]*\\b${a}="(\\d+)"`, 'g'))
).reduce((s, m) => s + parseInt(m[1], 10), 0);
console.log([
sumAttr('tests'), sumAttr('failures'),
sumAttr('errors'), sumAttr('skipped'),
].join(' '));
EOF
)" "${bundle_dir}/junit.xml")
printf 'summary: tests=%s failures=%s errors=%s skipped=%s\n' \
"$tests" "$failures" "$errors" "$skipped"
elif command -v grep >/dev/null 2>&1; then
tests="$(grep -oP 'tests="\K\d+' "${bundle_dir}/junit.xml" \
| head -1 || printf '?')"
failures="$(grep -oP 'failures="\K\d+' "${bundle_dir}/junit.xml" \
| head -1 || printf '?')"
errors="$(grep -oP 'errors="\K\d+' "${bundle_dir}/junit.xml" \
| head -1 || printf '?')"
skipped="$(grep -oP 'skipped="\K\d+' "${bundle_dir}/junit.xml" \
| head -1 || printf '?')"
printf 'summary: tests=%s failures=%s errors=%s skipped=%s\n' \
"$tests" "$failures" "$errors" "$skipped"
fi
fi
exit "$rc"

View File

@@ -0,0 +1,32 @@
{
"name": "claude-desktop-debian-test-harness",
"version": "0.0.1",
"private": true,
"description": "Linux compatibility test harness for claude-desktop-debian",
"type": "module",
"engines": {
"node": ">=20"
},
"scripts": {
"test": "playwright test",
"sweep": "bash orchestrator/sweep.sh",
"typecheck": "tsc --noEmit",
"explore": "npx tsx explore/explore.ts",
"grounding-probe": "npx tsx grounding-probe.ts",
"explore:snapshot": "npx tsx explore/explore.ts snapshot",
"explore:diff": "npx tsx explore/explore.ts diff",
"explore:walk": "npx tsx explore/explore.ts walk",
"derive:vocabulary": "npx tsx explore/derive-vocabulary.ts",
"gen:render-specs": "npx tsx explore/gen-render-specs.ts"
},
"devDependencies": {
"@playwright/test": "^1.48.0",
"@types/node": "^20.16.0",
"playwright": "^1.48.0",
"typescript": "^5.6.0"
},
"dependencies": {
"@electron/asar": "^3.2.10",
"dbus-next": "^0.10.2"
}
}

View File

@@ -0,0 +1,25 @@
/// <reference types="node" />
import { defineConfig } from '@playwright/test';
const resultsDir = process.env.RESULTS_DIR ?? './results/local';
export default defineConfig({
testDir: './src/runners',
testMatch: /.*\.spec\.ts$/,
fullyParallel: false,
workers: 1,
retries: process.env.CI ? 1 : 0,
forbidOnly: !!process.env.CI,
timeout: 60_000,
expect: { timeout: 10_000 },
outputDir: `${resultsDir}/test-output`,
reporter: [
['list'],
['junit', { outputFile: `${resultsDir}/junit.xml` }],
['html', { outputFolder: `${resultsDir}/html`, open: 'never' }],
],
use: {
trace: 'retain-on-failure',
screenshot: 'only-on-failure',
},
});

163
tools/test-harness/probe.ts Normal file
View File

@@ -0,0 +1,163 @@
// Standalone probe that connects to a running claude-desktop with the
// main process debugger enabled (port 9229) and dumps renderer-DOM
// shapes useful for designing reusable abstractions in lib/claudeai.ts.
//
// Run from tools/test-harness:
// npx tsx probe.ts
//
// Non-destructive — observes only, doesn't click anything.
import { InspectorClient } from './src/lib/inspector.js';
import { writeFileSync } from 'node:fs';
async function main() {
const client = await InspectorClient.connect(9229);
const webContentsList = await client.evalInMain<
Array<{ id: number; url: string; type: string }>
>(`
const { webContents } = process.mainModule.require('electron');
return webContents.getAllWebContents().map(w => ({
id: w.id,
url: w.getURL(),
type: w.getType ? w.getType() : 'unknown',
}));
`);
const target = webContentsList.find((w) => w.url.includes('claude.ai'));
if (!target) {
console.error('No claude.ai webContents — open the app to a logged-in state first.');
console.error('webContents observed:', webContentsList);
process.exit(1);
}
console.log('=== webContents ===');
console.log(JSON.stringify(webContentsList, null, 2));
console.log('Targeting:', target.url, `(id=${target.id})`);
// All "pill"-shape buttons on the page.
const pills = await client.evalInRenderer<{
dfPills: Array<{ ariaLabel: string | null; text: string; visible: boolean; classSig: string }>;
menuButtons: Array<{
ariaLabel: string | null;
text: string;
expanded: boolean;
truncateMaxW: string | null;
classSig: string;
}>;
summary: { totalButtons: number; ariaHaspopupMenu: number; dfPills: number };
}>(
'claude.ai',
`
(() => {
const buttons = Array.from(document.querySelectorAll('button'));
const dfPills = buttons
.filter(b => /\\bdf-pill\\b/.test(b.className))
.map(b => ({
ariaLabel: b.getAttribute('aria-label'),
text: (b.textContent || '').trim().slice(0, 80),
visible: !!b.getClientRects().length,
classSig: b.className.slice(0, 120),
}));
const menuButtons = buttons
.filter(b => b.getAttribute('aria-haspopup') === 'menu')
.map(b => {
const truncSpan = b.querySelector('span.truncate');
const maxW = truncSpan
? (truncSpan.className.match(/max-w-\\[[^\\]]+\\]/) || [null])[0]
: null;
return {
ariaLabel: b.getAttribute('aria-label'),
text: (b.textContent || '').trim().slice(0, 80),
expanded: b.getAttribute('aria-expanded') === 'true',
truncateMaxW: maxW,
classSig: b.className.slice(0, 120),
};
});
return {
dfPills,
menuButtons,
summary: {
totalButtons: buttons.length,
ariaHaspopupMenu: menuButtons.length,
dfPills: dfPills.length,
},
};
})()
`,
);
console.log('\n=== Pills summary ===');
console.log(JSON.stringify(pills.summary, null, 2));
console.log('\n=== df-pill buttons ===');
console.log(JSON.stringify(pills.dfPills, null, 2));
console.log('\n=== aria-haspopup=menu buttons (sample) ===');
console.log(JSON.stringify(pills.menuButtons.slice(0, 10), null, 2));
// Currently open menu (if any) — items, structure.
const openMenu = await client.evalInRenderer<{
menuPresent: boolean;
ariaLabelledBy: string | null;
items: Array<{ role: string; text: string; ariaChecked: string | null; disabled: boolean }>;
} | null>(
'claude.ai',
`
(() => {
const menu = document.querySelector('[role=menu][data-open]') || document.querySelector('[role=menu]');
if (!menu) return null;
const items = Array.from(menu.querySelectorAll('[role=menuitem], [role=menuitemradio], [role=menuitemcheckbox]'))
.map(el => ({
role: el.getAttribute('role') || '',
text: (el.textContent || '').trim().slice(0, 80),
ariaChecked: el.getAttribute('aria-checked'),
disabled: el.hasAttribute('data-disabled') || el.getAttribute('aria-disabled') === 'true',
}));
return {
menuPresent: true,
ariaLabelledBy: menu.getAttribute('aria-labelledby'),
items,
};
})()
`,
);
console.log('\n=== Currently open menu ===');
console.log(openMenu ? JSON.stringify(openMenu, null, 2) : 'no menu open');
// URL and basic page state.
const pageState = await client.evalInRenderer<{
url: string;
title: string;
readyState: string;
hasComposer: boolean;
hasSidebar: boolean;
}>(
'claude.ai',
`
(() => ({
url: location.href,
title: document.title,
readyState: document.readyState,
hasComposer: !!document.querySelector('[data-testid*=composer], textarea[placeholder*=Reply], textarea[placeholder*=Message]'),
hasSidebar: !!document.querySelector('nav, [role=navigation]'),
}))()
`,
);
console.log('\n=== Page state ===');
console.log(JSON.stringify(pageState, null, 2));
const out = { webContentsList, pills, openMenu, pageState };
writeFileSync('/tmp/claude-probe.json', JSON.stringify(out, null, 2));
console.log('\nFull dump → /tmp/claude-probe.json');
client.close();
process.exit(0);
}
main().catch((err) => {
console.error('probe failed:', err);
process.exit(1);
});

View File

@@ -0,0 +1,44 @@
// Read a process's argv from /proc/<pid>/cmdline.
//
// /proc/<pid>/cmdline is a single string of NUL-separated args (no
// trailing NUL on most kernels; trim defensively). Used by QE-6 / S12
// to verify the launcher appended the right Electron flags, and by
// future flag-presence tests (Decision 6 Wayland-default Smoke, S07
// CLAUDE_USE_WAYLAND, etc.).
//
// readPidArgv returns null if the process is gone — callers usually
// want to retry until the pid stabilizes.
import { readFile } from 'node:fs/promises';
export async function readPidArgv(pid: number): Promise<string[] | null> {
try {
const raw = await readFile(`/proc/${pid}/cmdline`, 'utf8');
// Strip trailing NUL if present, then split. Empty argv is
// theoretically possible (kernel threads); preserve it.
const trimmed = raw.endsWith('\0') ? raw.slice(0, -1) : raw;
return trimmed.length === 0 ? [] : trimmed.split('\0');
} catch {
return null;
}
}
export function argvHasFlag(argv: string[], flag: string): boolean {
// Matches `--enable-features=GlobalShortcutsPortal` (full equality)
// and `--enable-features` (bare flag, value in next argv slot).
// Substring match handles `--enable-features=Foo,Bar` correctly when
// flag is `--enable-features=Foo`.
for (const arg of argv) {
if (arg === flag) return true;
if (arg.startsWith(`${flag}=`)) return true;
// Comma-separated --enable-features value: match any subkey.
if (flag.includes('=')) {
const [key, val] = flag.split('=', 2);
if (arg.startsWith(`${key}=`)) {
const values = arg.slice(key!.length + 1).split(',');
if (values.includes(val!)) return true;
}
}
}
return false;
}

View File

@@ -0,0 +1,55 @@
// Read files out of the installed app.asar without on-disk extraction.
//
// Used by QE-19 / S09 (verify the KDE-gate string is in the bundled
// JS) and by future patch-sanity tests for tray.sh / cowork.sh /
// claude-code.sh patches. Reading via @electron/asar avoids the
// `npx asar extract /tmp/inspect-installed` dance — same outcome, no
// temp tree, JSON-grepable from inside a TS spec.
//
// Path resolution mirrors lib/electron.ts:resolveInstall(): respect
// CLAUDE_DESKTOP_APP_ASAR if set, otherwise probe the deb and rpm
// install locations.
import { extractFile, listPackage } from '@electron/asar';
import { existsSync } from 'node:fs';
const DEFAULT_ASAR_PATHS = [
'/usr/lib/claude-desktop/app.asar',
'/opt/Claude/resources/app.asar',
'/usr/lib/claude-desktop/node_modules/electron/dist/resources/app.asar',
'/opt/Claude/node_modules/electron/dist/resources/app.asar',
];
export function resolveAsarPath(): string {
const env = process.env.CLAUDE_DESKTOP_APP_ASAR;
if (env) return env;
for (const candidate of DEFAULT_ASAR_PATHS) {
if (existsSync(candidate)) return candidate;
}
throw new Error(
'Could not locate app.asar. Set CLAUDE_DESKTOP_APP_ASAR or install ' +
'the deb/rpm package.',
);
}
export function readAsarFile(filename: string, asarPath?: string): string {
const archive = asarPath ?? resolveAsarPath();
const buf = extractFile(archive, filename);
return buf.toString('utf8');
}
export function asarContains(
filename: string,
needle: string | RegExp,
asarPath?: string,
): boolean {
const contents = readAsarFile(filename, asarPath);
return typeof needle === 'string'
? contents.includes(needle)
: needle.test(contents);
}
export function listAsar(asarPath?: string): string[] {
const archive = asarPath ?? resolveAsarPath();
return listPackage(archive, { isPack: false });
}

View File

@@ -0,0 +1,255 @@
// AX-tree loading + traversal primitives — shared substrate for any
// test that reads from Chromium's accessibility tree.
//
// Why this exists
// ---------------
// Sessions 1-12 grew two parallel AX consumers without consolidating
// the loading shape:
//
// 1. `lib/claudeai.ts` page-objects (CodeTab.activate, openPill,
// clickMenuItem, findCompactPills) carry a private `snapshotAx`
// that gates on `waitForAxTreeStable` then calls
// `inspector.getAccessibleTree('claude.ai')` and converts via
// `axTreeToSnapshot`. Every page-object that polls for a node
// rolls its own retryUntil/while loop around that helper.
//
// 2. `src/runners/T26_routines_page_renders.spec.ts` re-implemented
// the same `snapshotAx` shape inline because the claudeai.ts
// version isn't exported. Its leading comment explicitly noted
// this was "premature abstraction" at 1 consumer; with 2 it is
// threshold-driven extraction.
//
// Plus the user reports recurring flake in tests that use the AX tree:
// queries fire before the relevant subtree is mounted, and individual
// specs each pick their own retryUntil budget. The proposed
// `waitForAxNode` primitive collapses the snapshot+find+retry shape
// into one helper with a single tunable budget per consumer, reducing
// both the surface area for budget drift and the duplication.
//
// What this primitive does
// ------------------------
// - `snapshotAx(inspector, opts)` — single AX tree read with the
// stability gate. Replaces the duplicated implementations in
// `claudeai.ts` (private) and `T26_routines_page_renders.spec.ts`
// (inlined). `opts.fast` skips the stability gate for inside-poll
// callers (matches the existing claudeai.ts contract).
// - `waitForAxNode(inspector, predicate, opts)` — repeatedly snapshot
// the AX tree and return the first element matching `predicate`,
// subject to a timeout. Built against the loops in `CodeTab.activate`
// (poll for compact pills), `openPill` (poll for menu items),
// `clickMenuItem` (poll for matching menuitem), and T26's pre/post-
// click anchor scans. The predicate carries the discrimination
// logic the caller already had inline; the primitive owns the
// stability-gate + retry loop.
// - Re-exports `RawElement`, `axTreeToSnapshot`, `waitForAxTreeStable`
// from `explore/walker.ts` so consumers don't need to reach across
// the lib/explore boundary themselves. The walker stays the source
// of truth for the AX-snapshot shape; this file is the runner-
// facing surface.
//
// Scope boundaries
// ----------------
// This is NOT a "wait for surface rendered" registry. The plan-doc
// proposal mentioned `waitForRenderedSurface(client, surfaceKey)`
// with a registry of named surface anchors — that's still
// speculative (no consumer asks for it). When a third consumer
// emerges that already knows it wants a named surface anchor (e.g.
// "the Code tab body has mounted"), promote the relevant claudeai.ts
// page-object into a registry entry. Today, `waitForAxNode` with a
// predicate covers every observed callsite.
//
// This is also NOT a CSS-querySelector primitive. T07 polls the DOM
// via `document.querySelector('[data-testid=...]')` for the topbar;
// that's a different abstraction (DOM, not AX) with no extraction
// signal yet — leave it inline in T07 until a second consumer
// surfaces.
import type { AxNode, InspectorClient } from './inspector.js';
import {
type RawElement,
axTreeToSnapshot,
waitForAxTreeStable,
} from '../../explore/walker.js';
import { retryUntil } from './retry.js';
// Re-exports for consumer convenience. Anything that today imports
// `RawElement` / `axTreeToSnapshot` / `waitForAxTreeStable` from
// `../../explore/walker.js` can switch to this file as the import
// path. Keeping the walker as the source of truth — these are the
// runner-facing aliases.
export type { AxNode } from './inspector.js';
export {
type RawElement,
axTreeToSnapshot,
waitForAxTreeStable,
} from '../../explore/walker.js';
// Re-export the AxNode -> RawElement[] conversion as a single import
// point. (Kept distinct from `axTreeToSnapshot`'s walker-side export
// so future renames in `explore/walker.ts` don't churn the runner-
// facing API.)
export interface SnapshotAxOptions {
// Skip the upfront `waitForAxTreeStable` gate. Default false —
// i.e. callers gate by default. Pass true inside polling loops
// where the gate fights the loop: each iteration would block
// waiting for "no node-count change" even when the change we're
// polling for is exactly the AX tree updating.
//
// `waitForAxNode` itself uses fast=true on every iteration after
// gating once at the start; consumers calling `snapshotAx` from
// inside a hand-rolled loop should do the same.
fast?: boolean;
// AX-stability gate budget when `fast` is false. Default 10000ms
// — matches the existing claudeai.ts/T26 inline implementations.
// Increase for cold-cache cases on slow machines.
stabilityTimeoutMs?: number;
// Renderer URL filter for `inspector.getAccessibleTree`. Default
// 'claude.ai'. Tests against a different webContents (find_in_page,
// main_window) can override but the AX tree on those is much
// simpler — `claude.ai` is the only one current consumers care
// about.
urlFilter?: string;
}
// Single AX-tree read, returning the walker's flat RawElement[]
// snapshot. Identical contract to the private `snapshotAx` formerly in
// `claudeai.ts` and the inlined one formerly in T26 — extracted here
// so both consumers share an implementation.
//
// Cost: ~800ms when the stability gate hits "stable" on the first
// pair of reads (interior-loop fast=true callers skip this); a few
// seconds on cold-cache. The AX tree itself is comparatively cheap
// to fetch and convert (~50-100ms).
export async function snapshotAx(
inspector: InspectorClient,
opts: SnapshotAxOptions = {},
): Promise<RawElement[]> {
if (!opts.fast) {
await waitForAxTreeStable(inspector, {
minNodes: 1,
timeoutMs: opts.stabilityTimeoutMs ?? 10_000,
});
}
const url = opts.urlFilter ?? 'claude.ai';
const nodes: AxNode[] = await inspector.getAccessibleTree(url);
return axTreeToSnapshot(nodes);
}
export interface WaitForAxNodeOptions {
// Total budget for the polling loop. Default 5000ms — matches the
// claudeai.ts / T26 callsites that the primitive replaces. Override
// upward for cold-cache or post-click cases (T26 uses 10s post-
// click; CodeTab.activate uses 5s default but T16 passes 15s).
timeoutMs?: number;
// Per-iteration interval. Default 200ms — matches the existing
// inline retryUntil({ interval: 200 }) calls. The AX tree fetch
// itself dominates the loop cost; a shorter interval gives no
// throughput benefit and a longer one delays the resolution.
intervalMs?: number;
// Renderer URL filter passed through to `snapshotAx`. Default
// 'claude.ai'.
urlFilter?: string;
// Whether to gate on `waitForAxTreeStable` once before entering
// the poll loop. Default true. When the caller has just mutated
// the page (e.g. clicked a button and is waiting for the
// resulting menu to render) the upfront stability gate is what
// keeps the first iteration from racing the in-flight render.
// After the upfront gate, every iteration uses fast=true so the
// loop iterates without re-blocking on stability.
stabilityGate?: boolean;
// AX-stability gate budget for the upfront `waitForAxTreeStable`
// when `stabilityGate` is true. Default 5000ms. Independent from
// the outer poll budget — the gate is a hard precondition, not
// part of the find loop.
stabilityTimeoutMs?: number;
}
// Poll the AX tree until the predicate matches a node, or the budget
// runs out. Returns the matched RawElement on success, null on
// timeout.
//
// The predicate runs over RawElement (the walker-snapshot shape) so
// callers can use the same `el.computedRole === 'button' &&
// el.accessibleName === 'Code'` form they already have inline. The
// helper does NOT click the matched node — callers receive the
// RawElement and can pass `el.backendDOMNodeId` to
// `inspector.clickByBackendNodeId` if a click follows. Keeping click
// out of the find primitive lets composite consumers (e.g. "find then
// click then poll for the menu") chain cleanly.
//
// On timeout, returns null. Callers that want a hard fail with a
// diagnostic should pattern-match `if (!found) throw new Error(...)`
// — the primitive doesn't throw because some specs surface
// missing-node as a clean fail with a JSON snapshot attachment
// rather than an uncaught timeout.
//
// The `name` param is purely for diagnostic message hygiene if a
// consumer wraps a throw around the null return — it's appended to
// the implicit "looking for a node matching <predicate>" so failure
// logs read meaningfully. Optional; pass an empty string to suppress.
export async function waitForAxNode(
inspector: InspectorClient,
predicate: (el: RawElement) => boolean,
opts: WaitForAxNodeOptions = {},
): Promise<RawElement | null> {
const stabilityGate = opts.stabilityGate ?? true;
if (stabilityGate) {
await waitForAxTreeStable(inspector, {
minNodes: 1,
timeoutMs: opts.stabilityTimeoutMs ?? 5_000,
});
}
return retryUntil(
async () => {
const elements = await snapshotAx(inspector, {
fast: true,
urlFilter: opts.urlFilter,
});
return elements.find(predicate) ?? null;
},
{
timeout: opts.timeoutMs ?? 5_000,
interval: opts.intervalMs ?? 200,
},
);
}
// Same shape as `waitForAxNode` but returns every match rather than
// the first. Useful for consumers that want to enumerate all menu
// items or all compact pills after a stability point — the
// findCompactPills caller in claudeai.ts is a one-shot snapshot
// today, but if a consumer needs to wait for "at least one compact
// pill" plus enumerate the resulting set, this avoids a second
// round-trip.
//
// Returns the (possibly empty) array on success, null on timeout
// when no element ever matched. A successful call with zero matches
// is impossible by construction — the loop only resolves once the
// post-filter array is non-empty.
export async function waitForAxNodes(
inspector: InspectorClient,
predicate: (el: RawElement) => boolean,
opts: WaitForAxNodeOptions = {},
): Promise<RawElement[] | null> {
const stabilityGate = opts.stabilityGate ?? true;
if (stabilityGate) {
await waitForAxTreeStable(inspector, {
minNodes: 1,
timeoutMs: opts.stabilityTimeoutMs ?? 5_000,
});
}
return retryUntil(
async () => {
const elements = await snapshotAx(inspector, {
fast: true,
urlFilter: opts.urlFilter,
});
const matches = elements.filter(predicate);
return matches.length > 0 ? matches : null;
},
{
timeout: opts.timeoutMs ?? 5_000,
interval: opts.intervalMs ?? 200,
},
);
}

View File

@@ -0,0 +1,397 @@
// claude.ai renderer-UI domain wrapper — single point of coupling to
// upstream's accessibility tree for tests that drive the renderer.
//
// Why centralize: claude.ai's UI ships from a different release train
// than the Electron shell, so any cross-spec drift would be an N-file
// fix. Confining the discovery here means the rest of the harness can
// speak in domain verbs (`activate('Code')`, `openEnvPill()`, …) and
// we only retune one file when upstream drifts.
//
// Discovery substrate is Chromium's accessibility tree
// (`Accessibility.getFullAXTree` over CDP), shared with the v7 walker.
// Reading from AX rather than the DOM means the page-objects survive
// tailwind class regeneration and React-tree restructuring as long as
// the platform-computed role + accessible name + ancestor landmarks
// stay stable. See docs/learnings/test-harness-ax-tree-walker.md for
// the gotchas (AX-enable async lag, post-click stability gating, list
// virtualization).
//
// Discrimination shapes used:
// - Top-level tabs: `role: 'button'` whose accessibleName matches
// the literal tab label ('Chat' | 'Cowork' | 'Code'). The
// `df-pill` tailwind anchor and `aria-label` selector are gone —
// the AX-computed name is the durable contract.
// - Compact pills (the env pill on Code, the "Select folder…" pill
// after Local is chosen): `role: 'button'` with
// `hasPopup === 'menu'`, scoped away from the cowork sidebar by
// filtering out per-row `^More options for ` triggers. The visible
// label is the button's accessibleName.
// - Menu items: any of `menuitem` / `menuitemradio` /
// `menuitemcheckbox` (collected as MENU_ITEM_ROLES below).
import type { InspectorClient } from './inspector.js';
import {
snapshotAx,
waitForAxNode,
waitForAxNodes,
waitForAxTreeStable,
} from './ax.js';
import { retryUntil, sleep } from './retry.js';
// All three CDP-exposed menu-item variants. Caller code wants to treat
// them uniformly — radios and checkboxes are still "items in an open
// menu the user can pick".
const MENU_ITEM_ROLES = new Set<string>([
'menuitem',
'menuitemradio',
'menuitemcheckbox',
]);
// AccessibleName patterns that indicate a per-row trigger button on
// the cowork sidebar (~70+ of them on a busy account). They share the
// same `hasPopup: 'menu'` signal as the compact pills we actually
// want, so excluding them by name is the load-bearing discriminator.
const ROW_MORE_OPTIONS_RE = /^More options for /;
// `snapshotAx` and the stability gate are now in `lib/ax.ts` —
// extracted there in session 13 once T26 had to redefine the same
// helper inline (two consumers = threshold-driven extraction). Page-
// objects below import via the lib aliases; consumers outside this
// file should reach for `lib/ax.ts` directly rather than re-importing
// through `lib/claudeai.ts`.
// One of the three top-level pills. Click is fire-and-forget — the
// router rerenders the tab body inline (no URL change on Code), so
// callers must poll for whatever signal indicates *their* next step is
// ready (e.g. CodeTab.activate polls for the env pill).
//
// AX-tree match: `role: 'button'` with the literal tab name as the
// accessible name. The visible label and aria-label happen to coincide
// today, and the AX-computed name follows the same cascade — pinning
// to the name keeps the page-object durable across the tailwind
// regenerations that motivated the migration.
//
// Pre-click polling budget. Up to session 13, this was a one-shot
// snapshot — if the tab button hadn't rendered yet when activateTab
// was called, the function returned `{ clicked: false }` immediately.
// Session 13's `waitForAxNode` substrate makes "wait for the button to
// appear" a one-line shape-only change. Default 5000ms matches the
// `lib/ax.ts` defaults; callers that previously relied on the no-retry
// shape pass `timeout: 0` (e.g. via `waitForAxNode`'s timeoutMs) to
// keep the old behaviour, though no caller currently does so. T16
// passes 15s through `CodeTab.activate({ timeout })` — that budget is
// still spent on the post-click pill poll; the pre-click click budget
// is independent.
export async function activateTab(
inspector: InspectorClient,
name: 'Chat' | 'Cowork' | 'Code',
opts: { timeout?: number } = {},
): Promise<{ clicked: boolean }> {
const target = await waitForAxNode(
inspector,
(el) =>
el.computedRole === 'button' && el.accessibleName === name,
{ timeoutMs: opts.timeout ?? 5_000 },
);
if (!target || target.backendDOMNodeId === null) {
return { clicked: false };
}
await inspector.clickByBackendNodeId('claude.ai', target.backendDOMNodeId);
return { clicked: true };
}
// A "compact pill" — the React component used by both the env pill and
// the "Select folder…" pill. AX shape: `role: 'button'` with
// `hasPopup === 'menu'`, scoped away from cowork sidebar row triggers
// (`/^More options for /`). The tailwind `max-w-[Npx]` field used to
// be carried as a diagnostic in v6; that signal isn't in the AX tree
// (and it was tailwind-specific, exactly the kind of thing the
// migration was meant to drop), so it's gone — callers only used it
// in error messages.
export interface CompactPill {
text: string;
}
export async function findCompactPills(
inspector: InspectorClient,
): Promise<CompactPill[]> {
const elements = await snapshotAx(inspector);
return elements
.filter(
(el) =>
el.computedRole === 'button' &&
el.hasPopup === 'menu' &&
el.accessibleName !== null &&
el.accessibleName.length > 0 &&
!ROW_MORE_OPTIONS_RE.test(el.accessibleName),
)
.map((el) => ({ text: el.accessibleName as string }));
}
// Open a compact pill whose accessibleName matches `labelPattern`.
// Discrimination: `role: 'button'` AND `hasPopup === 'menu'` AND the
// AX-computed name passes the regex. The hasPopup gate is what stops
// us trial-clicking action buttons that happen to share text with a
// pill — the pill always carries an aria-haspopup contract (it opens
// a popover) while a same-named action button does not.
//
// Polls the AX tree post-click for the menu to render (any role in
// MENU_ITEM_ROLES). Returns the rendered menu item names so the caller
// can validate without a second snapshot round-trip.
export async function openPill(
inspector: InspectorClient,
labelPattern: RegExp,
opts: { timeout?: number } = {},
): Promise<{ opened: boolean; items: string[] }> {
const timeout = opts.timeout ?? 5000;
const elements = await snapshotAx(inspector);
const target = elements.find(
(el) =>
el.computedRole === 'button' &&
el.hasPopup === 'menu' &&
el.accessibleName !== null &&
labelPattern.test(el.accessibleName),
);
if (!target || target.backendDOMNodeId === null) {
return { opened: false, items: [] };
}
await inspector.clickByBackendNodeId('claude.ai', target.backendDOMNodeId);
// Menu render is async and the AX tree lags DOM by hundreds of ms
// (see docs/learnings/test-harness-ax-tree-walker.md §1). Gate
// once on stability post-click, then poll fast — re-gating on every
// iteration would burn 800ms+ each cycle waiting for "no change"
// when what we want is "menuitems appear".
await waitForAxTreeStable(inspector, { minNodes: 1, timeoutMs: 5_000 });
const deadline = Date.now() + timeout;
while (Date.now() < deadline) {
const post = await snapshotAx(inspector, { fast: true });
const items = post.filter((el) => MENU_ITEM_ROLES.has(el.computedRole));
if (items.length > 0) {
return {
opened: true,
items: items.map((el) => (el.accessibleName ?? '').slice(0, 80)),
};
}
await sleep(100);
}
return { opened: false, items: [] };
}
// Click any menuitem (any of MENU_ITEM_ROLES) whose accessibleName
// matches `textPattern`. Caller opens the menu first. Polls the AX
// snapshot — menu render is async and the AX tree lags DOM by
// hundreds of ms.
//
// Returns the matched item's text and the full item list at the time
// of the match — the second is useful for diagnostics when `clicked`
// is null.
export async function clickMenuItem(
inspector: InspectorClient,
textPattern: RegExp,
opts: { timeout?: number } = {},
): Promise<{ clicked: string | null; items: string[] }> {
const timeout = opts.timeout ?? 1500;
// Caller has just opened a menu — gate once on stability so the
// first iteration sees the populated tree, then poll fast for the
// match. Same shape as openPill's post-click handling.
await waitForAxTreeStable(inspector, { minNodes: 1, timeoutMs: 5_000 });
const deadline = Date.now() + timeout;
let lastItemNames: string[] = [];
while (Date.now() < deadline) {
const elements = await snapshotAx(inspector, { fast: true });
const items = elements.filter((el) =>
MENU_ITEM_ROLES.has(el.computedRole),
);
lastItemNames = items.map((el) => (el.accessibleName ?? '').slice(0, 80));
const match = items.find(
(el) =>
el.accessibleName !== null && textPattern.test(el.accessibleName),
);
if (match && match.backendDOMNodeId !== null) {
const text = (match.accessibleName ?? '').slice(0, 80);
await inspector.clickByBackendNodeId(
'claude.ai',
match.backendDOMNodeId,
);
return { clicked: text, items: lastItemNames };
}
await sleep(100);
}
return { clicked: null, items: lastItemNames };
}
// Dispatch an Escape keydown to the document. Used by openEnvPill's
// trial-click loop to dismiss the menu when the wrong pill was hit.
// We dispatch on document because the popover trigger may not have
// retained focus.
export async function pressEscape(inspector: InspectorClient): Promise<void> {
await inspector.evalInRenderer<null>(
'claude.ai',
`(() => {
document.dispatchEvent(new KeyboardEvent('keydown', {
key: 'Escape', code: 'Escape', keyCode: 27, which: 27,
bubbles: true, cancelable: true,
}));
return null;
})()`,
);
}
// Code tab domain operations. Instance-shaped (carries the inspector)
// to match QuickEntry / MainWindow in quickentry.ts.
//
// Only valid after the renderer has loaded a logged-in claude.ai page;
// callers should `app.waitForReady('userLoaded')` first. activate()
// itself doesn't repeat that check — it would just fail to find the
// Code button on /login, which surfaces as a clear error.
export class CodeTab {
constructor(private readonly inspector: InspectorClient) {}
// Click the Code tab, then poll up to `timeout` for at least one
// compact pill to render. The env pill rendering is the cheapest
// signal that the Code-tab body has mounted and is interactive —
// the URL doesn't change (route stays `/new` etc.), so we can't
// anchor on navigation. Throws on miss with the candidate count for
// triage.
//
// Session 14 migration: the pre-click `activateTab` call now polls
// up to `opts.timeout` for the Code button itself to appear (was a
// one-shot snapshot prior — the T16 failure mode). Same budget
// covers both phases; in practice the click resolves in well under
// a second when the Code button is present, so the post-click pill
// poll inherits the bulk of the budget.
async activate(opts: { timeout?: number } = {}): Promise<void> {
const timeout = opts.timeout ?? 5000;
const result = await activateTab(this.inspector, 'Code', { timeout });
if (!result.clicked) {
throw new Error(
'CodeTab.activate: no AX-tree button with accessibleName="Code" found',
);
}
// Post-click: poll the AX tree for at least one compact pill.
// `waitForAxNodes` carries the snapshot+filter+sleep loop
// formerly hand-rolled here, with the same per-iteration cadence
// (200ms) and overall budget. Predicate matches `findCompactPills`
// — `role: 'button'` + `hasPopup: 'menu'` + non-empty
// accessibleName + not a per-row "More options for X" trigger.
const ready = await waitForAxNodes(
this.inspector,
(el) =>
el.computedRole === 'button' &&
el.hasPopup === 'menu' &&
el.accessibleName !== null &&
el.accessibleName.length > 0 &&
!ROW_MORE_OPTIONS_RE.test(el.accessibleName),
{ timeoutMs: timeout, intervalMs: 200 },
);
if (!ready) {
throw new Error(
`CodeTab.activate: no compact pill rendered within ${timeout}ms ` +
`after clicking Code — tab body may not have mounted`,
);
}
}
// Open the env pill (the compact pill whose menu contains a `^Local`
// menuitemradio). Trial-click strategy: for each compact pill, try
// opening it and check for the Local item. If absent, dismiss with
// Escape and try the next. Necessary because nothing in the DOM
// distinguishes the env pill from a future second compact pill at
// rest — only the menu contents disambiguate.
//
// Returns the matched pill's label text and the rendered menu
// items. Throws if no candidate yields a Local-bearing menu.
async openEnvPill(): Promise<{ pillText: string; items: string[] }> {
const pills = await findCompactPills(this.inspector);
if (pills.length === 0) {
throw new Error(
'CodeTab.openEnvPill: no compact pills on the page — ' +
'did you call activate() first?',
);
}
// Iterate by label rather than DOM index so we can use openPill
// with an exact-text anchor — avoids re-querying ordinals after
// each Escape (the DOM may shift).
for (const pill of pills) {
const labelRe = new RegExp(`^${escapeRegExp(pill.text)}$`);
const opened = await openPill(this.inspector, labelRe, { timeout: 1500 });
if (!opened.opened) continue;
const hasLocal = opened.items.some((t) => /^Local\b/.test(t));
if (hasLocal) {
return { pillText: pill.text, items: opened.items };
}
await pressEscape(this.inspector);
// Brief settle so the next openPill doesn't race the popover
// teardown. 150ms matches the original T17 implementation.
await sleep(150);
}
throw new Error(
`CodeTab.openEnvPill: probed ${pills.length} compact pill(s), ` +
`none yielded a menu containing /^Local\\b/`,
);
}
// Click the `^Local` menuitemradio inside the (already-open) env-pill
// menu. textContent reads "Local, environment settings, right arrow"
// because of the SR-only suffix; we anchor on /^Local\b/.
async selectLocal(): Promise<void> {
const result = await clickMenuItem(this.inspector, /^Local\b/);
if (!result.clicked) {
throw new Error(
`CodeTab.selectLocal: no /^Local\\b/ item in the open menu. ` +
`Items: ${JSON.stringify(result.items)}`,
);
}
}
// Full chain: open env pill → Local → wait for the "Select folder…"
// pill to render → open it → click "Open folder…". After this
// resolves, dialog.showOpenDialog has been invoked (the caller
// installs the mock first and polls getOpenDialogCalls to confirm).
//
// Each step throws on its own miss with enough metadata to tell
// which selector decayed; the caller can wrap the whole chain in
// try/catch for partial-state attachment.
async openFolderPicker(): Promise<void> {
await this.openEnvPill();
await this.selectLocal();
// The Select-folder pill renders after Local is chosen. Same
// CompactPill shape — anchor on the leading "Select folder"
// text. 4s budget matches the T17 wait that proved sufficient
// in practice on KDE-W.
const selectOpened = await retryUntil(
async () => {
const r = await openPill(this.inspector, /^Select folder/, {
timeout: 1000,
});
return r.opened ? r : null;
},
{ timeout: 4000, interval: 200 },
);
if (!selectOpened) {
throw new Error(
'CodeTab.openFolderPicker: "Select folder…" pill did not ' +
'open within 4s after Local was clicked',
);
}
// The Select-folder menu has a "Recent" group (radios — clicking
// reuses the past path silently, no dialog) followed by
// "Open folder…" (menuitem — fires the picker). Click the
// menuitem variant explicitly; clickMenuItem matches all
// menuitem* roles, so the leading-text anchor is what
// disambiguates here.
const openClicked = await clickMenuItem(this.inspector, /^Open folder/);
if (!openClicked.clicked) {
throw new Error(
`CodeTab.openFolderPicker: no /^Open folder/ menuitem in ` +
`the Select-folder menu. Items: ${JSON.stringify(openClicked.items)}`,
);
}
}
}
// Standard "escape regex special chars in a literal string" helper.
// Used to build an exact-match RegExp from a captured pill label.
function escapeRegExp(s: string): string {
return s.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}

View File

@@ -0,0 +1,40 @@
import { sessionBus, type MessageBus, type ClientInterface } from 'dbus-next';
let cached: MessageBus | null = null;
export function getSessionBus(): MessageBus {
if (!cached) {
cached = sessionBus();
}
return cached;
}
export async function disconnectBus(): Promise<void> {
if (cached) {
cached.disconnect();
cached = null;
}
}
// dbus-next exposes interface methods as dynamic properties typed loosely. Cast
// at the call site rather than re-typing every D-Bus interface we touch.
type DynamicMethod = (...args: unknown[]) => Promise<unknown>;
export function method(iface: ClientInterface, name: string): DynamicMethod {
const fn = (iface as unknown as Record<string, DynamicMethod | undefined>)[name];
if (typeof fn !== 'function') {
throw new Error(`D-Bus method ${name} not found on interface`);
}
return fn.bind(iface);
}
export async function getConnectionPid(connectionName: string): Promise<number> {
const bus = getSessionBus();
const proxy = await bus.getProxyObject(
'org.freedesktop.DBus',
'/org/freedesktop/DBus',
);
const iface = proxy.getInterface('org.freedesktop.DBus');
const result = await method(iface, 'GetConnectionUnixProcessID')(connectionName);
return result as number;
}

View File

@@ -0,0 +1,65 @@
import { readFile } from 'node:fs/promises';
import { homedir } from 'node:os';
import { join } from 'node:path';
import { execFile } from 'node:child_process';
import { promisify } from 'node:util';
const exec = promisify(execFile);
const LAUNCHER_LOG = join(
homedir(),
'.cache/claude-desktop-debian/launcher.log',
);
export async function readLauncherLog(): Promise<string | null> {
try {
return await readFile(LAUNCHER_LOG, 'utf8');
} catch {
return null;
}
}
export interface DoctorResult {
output: string;
exitCode: number | null;
}
export async function runDoctor(launcher?: string): Promise<DoctorResult> {
const bin = launcher ?? process.env.CLAUDE_DESKTOP_LAUNCHER ?? 'claude-desktop';
try {
const { stdout, stderr } = await exec(bin, ['--doctor'], { timeout: 15_000 });
return {
output: `${stdout}\n${stderr}`.trim(),
exitCode: 0,
};
} catch (err) {
// --doctor may exit non-zero if checks fail; still return the output
// and the actual exit code so T02/T13/S05 can assert against it.
const e = err as { stdout?: string; stderr?: string; code?: number };
const combined = `${e.stdout ?? ''}\n${e.stderr ?? ''}`.trim();
return {
output: combined,
exitCode: typeof e.code === 'number' ? e.code : null,
};
}
}
export function captureSessionEnv(): Record<string, string> {
const keys = [
'XDG_SESSION_TYPE',
'XDG_CURRENT_DESKTOP',
'WAYLAND_DISPLAY',
'DISPLAY',
'GDK_BACKEND',
'QT_QPA_PLATFORM',
'OZONE_PLATFORM',
'ELECTRON_OZONE_PLATFORM_HINT',
'CLAUDE_DESKTOP_LAUNCHER',
];
const out: Record<string, string> = {};
for (const k of keys) {
const v = process.env[k];
if (v !== undefined) out[k] = v;
}
return out;
}

View File

@@ -0,0 +1,413 @@
// "eipc" channel-registry primitive — runtime discovery of the custom
// `$eipc_message$_<UUID>_$_<scope>_$_<iface>_$_<method>` handlers
// registered on each per-webContents IPC scope.
//
// Why this exists
// ---------------
// Sessions 2-6 of the runner-implementation work treated the eipc
// registry as unreachable from main: the standard Electron
// `ipcMain._invokeHandlers` map only carries 3 chat-tab MCP-bridge
// handlers (`list-mcp-servers`, `connect-to-mcp-server`,
// `request-open-mcp-settings`); the 700+ `claude.web_$_*` /
// `claude.settings_$_*` etc. channels were assumed to be closure-
// local. Session 3's `globalThis` walk came up empty, which kept
// T22/T31/T33/T38 stuck as Tier 1 asar fingerprints rather than
// runtime registry probes.
//
// Session 7 found the missing piece: handlers DO go through
// Electron's stdlib `IpcMainImpl` — just not the GLOBAL `ipcMain`
// instance. Each `webContents` has its own `webContents.ipc` (per-
// `WebContents` IPC scope, introduced in Electron 17+), and that's
// where every `e.ipc.handle("$eipc_message$_..._$_<scope>_$_<iface>_$_<method>", fn)`
// call lands. Verified empirically against a debugger-attached
// running Claude:
// - find_in_page wc: 78 handlers (settings/find-in-page only)
// - main_window wc: 79 handlers (settings/title-bar only)
// - claude.ai wc: 490 handlers (full surface — including
// 117 LocalSessions, 16 CustomPlugins)
// - global ipcMain: 3 handlers (the chat-tab MCP-bridge trio)
//
// All `claude.web_$_*` interfaces (LocalSessions, CustomPlugins,
// CoworkSpaces, CoworkArtifacts, CoworkMemory, ClaudeCode, etc.)
// register on the claude.ai webContents. They're sticky across route
// changes — once registered (during webContents init), they don't
// deregister when the user navigates between /chats and /epitaxy.
// So the wait-for-channel poll just needs claude.ai to be alive +
// finished initial handler registration, NOT a specific route.
//
// What this primitive does
// ------------------------
// Read-only enumeration via `getEipcChannels` / `findEipcChannel` /
// `waitForEipcChannel(s)`. Handler PRESENCE checks (T22b / T31b / T33b
// / T38b) — that's strictly stronger than the asar fingerprint (a
// handler registered at runtime is a handler that actually wired up,
// not just a string in the bundle).
//
// Plus `invokeEipcChannel` (session 8 addition) — calls a registered
// handler through the renderer-side wrapper at `window['claude.<scope>']
// .<Iface>.<method>(...args)`. The wrapper is exposed by `mainView.js`
// preload via `contextBridge.exposeInMainWorld` after a frame + origin
// gate (top-level frame, origin in `{claude.ai, claude.com,
// preview.claude.ai, preview.claude.com, localhost}`). Because the
// `inspector.evalInRenderer('claude.ai', ...)` path runs inside the
// claude.ai renderer, the wrapper is present and the synthesized
// `IpcMainInvokeEvent` carries an honest `senderFrame` — the alternative
// of pulling the function out of `_invokeHandlers` and synthesizing a
// fake event with `senderFrame.url = 'https://claude.ai/'` works (the
// gates are duck-typed structural checks) but spoofs a security-relevant
// claim. Going through the wrapper keeps the test surface aligned with
// real attack surface.
//
// `invokeEipcChannel` is read-by-default but doesn't enforce a
// read-only allowlist — the safety property is that consumers pass
// case-doc-anchored suffixes verbatim, which limits the blast radius
// to whatever the case doc said the test should poke. Don't pass
// `start*` / `set*` / `write*` / `run*` / `openIn*` suffixes; those
// mutate user state.
//
// Framing opacity
// ---------------
// The `$eipc_message$_<UUID>_$_<scope>_$_<iface>_$_<method>` framing
// has been UUID-stable across builds (session 2 noted
// `c0eed8c9-c94a-4931-8cc3-3a08694e9863`; session 7 confirmed it's
// still that, single UUID across all 647 per-wc handlers). The
// primitive does not pin the UUID — match by suffix so a future
// build that rotates the UUID doesn't silently break every consuming
// spec. Suffix matching is also what the case-doc anchors use
// (`LocalSessions_$_getPrChecks` etc.), so consumers can pass the
// case-doc string verbatim.
import { retryUntil } from './retry.js';
import type { InspectorClient } from './inspector.js';
// One handler entry on a webContents. `suffix` is the part after the
// UUID — `<scope>_$_<iface>_$_<method>` — useful for dedup / display.
// `fullKey` is the full registry key including the framing prefix and
// UUID, kept for diagnostic attachments where the raw form matters
// (drift detection, regression triage). `webContentsId` lets a caller
// disambiguate when a future scope registers the same suffix on
// multiple webContents (today only `claude.settings/*` does this and
// every wc gets the same set; non-issue for current consumers).
export interface EipcChannel {
suffix: string;
fullKey: string;
webContentsId: number;
webContentsUrl: string;
}
export interface GetEipcChannelsOptions {
// Substring match on `webContents.getURL()`. Default: 'claude.ai'.
// Pass an empty string to enumerate every webContents.
urlFilter?: string;
// Optional scope filter — e.g. 'claude.web' to drop settings-
// scope handlers. Matched against the segment immediately after
// the UUID. Empty / undefined returns all scopes.
scope?: string;
// Optional interface filter — e.g. 'LocalSessions'. Matched
// against the segment after the scope. Empty / undefined returns
// all interfaces.
iface?: string;
}
// Internal: shape returned by the inspector eval below. Kept private
// so the `EipcChannel` interface above is the public type contract.
interface RawEntry {
wcId: number;
wcUrl: string;
fullKey: string;
}
// Enumerate every eipc-framed handler key registered on every matching
// webContents. The UUID is opaque to the caller — only the suffix
// (`<scope>_$_<iface>_$_<method>`) is exposed via the EipcChannel
// type. Filtering by `scope` / `iface` happens after the inspector
// eval (the eval keeps its filter set minimal so a single eval call
// covers every consumer's needs).
//
// Returns an empty array when no matching webContents exists (e.g.
// the spec called this before claude.ai loaded). Callers that need
// a "wait until present" semantic should use `waitForEipcChannel`
// instead.
export async function getEipcChannels(
inspector: InspectorClient,
opts: GetEipcChannelsOptions = {},
): Promise<EipcChannel[]> {
const urlFilter = opts.urlFilter ?? 'claude.ai';
const raw = await inspector.evalInMain<RawEntry[]>(`
const { webContents } = process.mainModule.require('electron');
const urlFilter = ${JSON.stringify(urlFilter)};
const out = [];
for (const wc of webContents.getAllWebContents()) {
const url = wc.getURL();
if (urlFilter && !url.includes(urlFilter)) continue;
const ipc = wc.ipc;
const map = ipc && ipc._invokeHandlers;
if (!map) continue;
const keys = (typeof map.keys === 'function')
? Array.from(map.keys())
: Object.keys(map);
for (const k of keys) {
out.push({ wcId: wc.id, wcUrl: url, fullKey: k });
}
}
return out;
`);
// Match the framing prefix and capture the suffix. Anything that
// doesn't match (e.g. a non-eipc handler that snuck onto a wc
// scope) gets filtered out — only eipc-framed entries are part of
// this primitive's contract.
const re = /^\$eipc_message\$_[0-9a-f-]+_\$_(.+)$/;
const out: EipcChannel[] = [];
for (const entry of raw) {
const m = re.exec(entry.fullKey);
if (!m) continue;
const suffix = m[1]!;
if (opts.scope) {
// Suffix shape: `<scope>_$_<iface>_$_<method>`. Anchor at
// the start so 'claude.web' matches but 'web' doesn't
// match `claude.settings` etc.
if (!suffix.startsWith(`${opts.scope}_$_`)) continue;
}
if (opts.iface) {
// Interface segment is after the scope — search for
// `_$_<iface>_$_` in the suffix. Anchored separators
// avoid accidentally matching a method name that happens
// to contain the iface string.
if (!suffix.includes(`_$_${opts.iface}_$_`)) continue;
}
out.push({
suffix,
fullKey: entry.fullKey,
webContentsId: entry.wcId,
webContentsUrl: entry.wcUrl,
});
}
return out;
}
export interface FindEipcChannelOptions {
// Substring match on `webContents.getURL()`. Default: 'claude.ai'.
urlFilter?: string;
}
// Locate the first registered handler whose suffix ends with
// `caseDocSuffix`. Designed so callers can pass the case-doc-anchored
// string verbatim — e.g. `LocalSessions_$_getPrChecks`. Returns null
// when no match exists (caller decides whether to fail, skip, or
// retry).
//
// This is a synchronous one-shot; for the populate-on-init wait, use
// `waitForEipcChannel` — it wraps this in a retryUntil.
export async function findEipcChannel(
inspector: InspectorClient,
caseDocSuffix: string,
opts: FindEipcChannelOptions = {},
): Promise<EipcChannel | null> {
const channels = await getEipcChannels(inspector, {
urlFilter: opts.urlFilter,
});
for (const ch of channels) {
if (ch.suffix.endsWith(caseDocSuffix)) return ch;
}
return null;
}
export interface WaitForEipcChannelOptions {
urlFilter?: string;
// Total budget for the poll. Default 15s — the claude.ai
// webContents' initial handler registration completes within a
// second of `userLoaded` on the dev box, so 15s leaves wide
// margin for slow-cache cases.
timeoutMs?: number;
intervalMs?: number;
}
// Poll until the named channel is registered, or the budget runs out.
// Use this when the spec just reached `waitForReady('userLoaded')` —
// the claude.ai webContents may exist but its handlers might not have
// finished registering yet. The poll is cheap (one inspector eval per
// tick + a string scan) so the default interval can be aggressive.
//
// Returns the EipcChannel on success, null on timeout. Callers that
// want a hard fail on timeout should `expect(channel, '...').not.toBeNull()`
// — the primitive doesn't throw because some specs want to surface
// missing-handler as a clean fail with diagnostics rather than an
// uncaught timeout.
export async function waitForEipcChannel(
inspector: InspectorClient,
caseDocSuffix: string,
opts: WaitForEipcChannelOptions = {},
): Promise<EipcChannel | null> {
return retryUntil(
() => findEipcChannel(inspector, caseDocSuffix, opts),
{
timeout: opts.timeoutMs ?? 15_000,
interval: opts.intervalMs ?? 250,
},
);
}
// Convenience: resolve a list of case-doc suffixes in one round-trip.
// Returns a Map keyed by the input suffix so callers can iterate the
// expected list and report per-suffix presence. Missing suffixes have
// `null` values.
//
// Single inspector call by design — the `getEipcChannels` cost is
// dominated by the eval round-trip, not the in-process filtering, so
// batching is strictly cheaper than N calls to `findEipcChannel`.
export async function findEipcChannels(
inspector: InspectorClient,
caseDocSuffixes: readonly string[],
opts: FindEipcChannelOptions = {},
): Promise<Map<string, EipcChannel | null>> {
const channels = await getEipcChannels(inspector, {
urlFilter: opts.urlFilter,
});
const out = new Map<string, EipcChannel | null>();
for (const suffix of caseDocSuffixes) {
const hit = channels.find((c) => c.suffix.endsWith(suffix));
out.set(suffix, hit ?? null);
}
return out;
}
// Wait until ALL of the listed suffixes are registered, or the budget
// runs out. Useful for trios like T31's side-chat (start/send/stop) —
// the trio is load-bearing as a unit; partial registration is a fail.
//
// Returns the resolved Map on full success. On timeout, returns the
// last-observed Map (some entries may be null) so callers can surface
// the partial state in their diagnostic attachment before failing.
export async function waitForEipcChannels(
inspector: InspectorClient,
caseDocSuffixes: readonly string[],
opts: WaitForEipcChannelOptions = {},
): Promise<Map<string, EipcChannel | null>> {
let lastSnapshot = new Map<string, EipcChannel | null>();
const result = await retryUntil(
async () => {
const snap = await findEipcChannels(
inspector,
caseDocSuffixes,
opts,
);
lastSnapshot = snap;
for (const v of snap.values()) if (v === null) return null;
return snap;
},
{
timeout: opts.timeoutMs ?? 15_000,
interval: opts.intervalMs ?? 250,
},
);
return result ?? lastSnapshot;
}
export interface InvokeEipcChannelOptions {
// Renderer URL filter. Default 'claude.ai' — the only webContents
// whose origin passes the wrapper-exposure gate (`Qc()` in
// `mainView.js`: `https://claude.ai`, `https://claude.com`,
// preview.*, localhost). The `find_in_page` and `main_window`
// webContents register `claude.settings/*` handlers in their
// per-wc IPC scope but their renderers run from `file://`, so
// `window['claude.settings']` is never exposed there and invocation
// through them would need a different (main-side, fake-event)
// approach not implemented in this primitive.
urlFilter?: string;
// Inspector eval timeout. Default = InspectorClient.defaultTimeoutMs
// (30s). Read-only handlers like `getMcpServersConfig` /
// `readGlobalMemory` / `getAllScheduledTasks` return well within
// 1s on a warm app; the 30s budget is for cold-cache cases.
timeoutMs?: number;
}
// Invoke an eipc handler through the renderer-side wrapper at
// `window['claude.<scope>'].<Iface>.<method>(...args)`. The suffix is
// resolved against the per-wc registry first (same matching rules as
// `findEipcChannel` — accepts both fully-qualified
// `claude.web_$_LocalSessions_$_getPrChecks` and the more concise
// `LocalSessions_$_getPrChecks`) and the scope/iface/method triplet is
// pulled from the resolved full suffix.
//
// Why through the renderer wrapper, not a direct main-side call:
// handlers register via `e.ipc.handle(framedName, async (event, args)
// => { if (!le(event)) throw ...; return A.<method>(args); })` — the
// origin gate is inlined at registration time (variants `le`/`Vi`/`mm`
// in the bundle, all duck-typed structural checks against
// `event.senderFrame.url` and `event.senderFrame.parent === null`).
// Pulling the function out of `_invokeHandlers` and calling it with a
// synthesized event whose `senderFrame.url` is `'https://claude.ai/'`
// works (the gate is structural, not `instanceof`-checked) but spoofs
// the gate's security claim. The wrapper IS at claude.ai, so the
// synthesized event carries an honest senderFrame and the test surface
// matches real attack surface.
//
// Errors:
// - "no handler registered with suffix": the registry walk returned
// nothing matching. Same shape as `findEipcChannel` returning null;
// waitForEipcChannel first if your spec needs the populate-on-init
// poll.
// - "eipc namespace missing in renderer: claude.<scope>": the wrapper
// isn't exposed on this renderer. Either the urlFilter selected a
// webContents whose origin failed `Qc()`, or the build flipped the
// scope's exposure gate. Check `evalInRenderer(urlFilter,
// 'Object.keys(window).filter(k => k.startsWith("claude."))')`.
// - String-form rejection from the renderer eval: the gate / arg-
// validator / result-validator inside the handler closure rejected.
// The framed channel name appears in the error message — use it to
// pinpoint which handler rejected.
//
// Args are JSON-marshaled into the renderer eval. Return value is
// JSON-deserialized via `evalInRenderer`'s `executeJavaScript` path.
// Non-JSON-serializable handler returns (Date, Buffer, circular refs)
// would mangle through this primitive — none of the current Tier 2
// case-doc consumers return such shapes; flag if a future one does.
export async function invokeEipcChannel<T = unknown>(
inspector: InspectorClient,
caseDocSuffix: string,
args: readonly unknown[] = [],
opts: InvokeEipcChannelOptions = {},
): Promise<T> {
const urlFilter = opts.urlFilter ?? 'claude.ai';
const channel = await findEipcChannel(inspector, caseDocSuffix, {
urlFilter,
});
if (!channel) {
throw new Error(
`invokeEipcChannel: no handler registered with suffix ` +
`'${caseDocSuffix}' on a webContents matching ` +
`'${urlFilter}'`,
);
}
// Full suffix is `<scope>_$_<iface>_$_<method>`. Scope contains a
// dot (e.g. claude.web) but the `_$_` separator is unambiguous —
// a 3-part split gives [scope, iface, method] cleanly.
const parts = channel.suffix.split('_$_');
if (parts.length !== 3) {
throw new Error(
`invokeEipcChannel: bad suffix shape '${channel.suffix}' ` +
`(expected '<scope>_$_<iface>_$_<method>')`,
);
}
const [scope, iface, method] = parts;
const argsJson = JSON.stringify(args);
const js = `(async () => {
const ns = window[${JSON.stringify(scope)}];
if (!ns) throw new Error(
'eipc namespace missing in renderer: ' + ${JSON.stringify(scope)}
);
const ifaceObj = ns[${JSON.stringify(iface)}];
if (!ifaceObj) throw new Error(
'eipc interface missing: ' + ${JSON.stringify(iface)} +
' (under ' + ${JSON.stringify(scope)} + ')'
);
const fn = ifaceObj[${JSON.stringify(method)}];
if (typeof fn !== 'function') throw new Error(
'eipc method not a function: ' + ${JSON.stringify(method)} +
' (under ' + ${JSON.stringify(scope)} + '.' + ${JSON.stringify(iface)} + ')'
);
return await fn.apply(ifaceObj, ${argsJson});
})()`;
return inspector.evalInRenderer<T>(urlFilter, js, opts.timeoutMs);
}

View File

@@ -0,0 +1,206 @@
// Mock-then-call helpers for side-effecting Electron module APIs.
//
// Tests that exercise an Electron egress whose real invocation would
// touch the host system (open a file manager, launch an editor, show a
// dialog) install a recorder mock first, then invoke the API via
// `inspector.evalInMain` and assert against the recorded calls. The
// pattern strengthens "didn't throw" probes into "the egress was
// reached + the args flowed through verbatim", with no host side
// effect.
//
// Each helper:
// - is idempotent within an Electron lifecycle (guarded by a
// globalThis flag so re-installation in retry loops is a no-op),
// - records `{ ts, ...args }` into a globalThis call list,
// - returns a value matching the real API's documented contract
// (void / Promise<boolean> / canned dialog result).
//
// The companion `get*Calls()` reader returns `[]` if the mock was
// never installed (rather than throwing) so pre-install reads in
// retry loops are cheap.
//
// Extracted from `lib/claudeai.ts` once the third helper landed
// (T17 dialog → T25 showItemInFolder → T24 openExternal). These
// helpers are not claude.ai-domain — they're generic Electron module
// patches — so the extraction keeps `claudeai.ts` focused on the AX-
// tree page-objects and gives future mock-then-call tests an obvious
// home to add to.
//
// Caller pattern: see `runners/T17_folder_picker.spec.ts`,
// `runners/T25_show_item_in_folder_no_throw.spec.ts`,
// `runners/T24_open_in_editor_no_throw.spec.ts`.
import type { InspectorClient } from './inspector.js';
// ----- dialog.showOpenDialog -----------------------------------------
// Replace dialog.showOpenDialog with a mock that records every call
// and returns a canned result. Idempotent — re-installing within the
// same Electron lifecycle is a no-op (guarded by
// globalThis.__claudeAiDialogMockInstalled). Mirrors the shape of
// QuickEntry.installInterceptor (quickentry.ts:86) so callers across
// libs feel consistent.
//
// The first BrowserWindow positional arg is optional in Electron's
// API, so the mock handles both `showOpenDialog(opts)` and
// `showOpenDialog(window, opts)` shapes.
export async function installOpenDialogMock(
inspector: InspectorClient,
cannedResult: { canceled: boolean; filePaths: string[] } = {
canceled: false,
filePaths: ['/tmp/claude-test-folder'],
},
): Promise<void> {
const canned = JSON.stringify(cannedResult);
await inspector.evalInMain<null>(`
if (globalThis.__claudeAiDialogMockInstalled) return null;
const { dialog } = process.mainModule.require('electron');
globalThis.__claudeAiDialogCalls = [];
const original = dialog.showOpenDialog.bind(dialog);
dialog.showOpenDialog = async function(...args) {
const browserWindowArg = args[0]
&& typeof args[0] === 'object'
&& args[0].constructor
&& args[0].constructor.name === 'BrowserWindow';
const opts = browserWindowArg ? args[1] : args[0];
globalThis.__claudeAiDialogCalls.push({
ts: Date.now(),
nargs: args.length,
title: opts && opts.title,
properties: opts && opts.properties,
});
return ${canned};
};
void original;
globalThis.__claudeAiDialogMockInstalled = true;
return null;
`);
}
export interface OpenDialogCall {
ts: number;
nargs: number;
title?: string;
properties?: string[];
}
// Read the recorded call list. Returns [] if the mock was never
// installed (rather than throwing) — pre-install reads in retry
// loops stay cheap.
export async function getOpenDialogCalls(
inspector: InspectorClient,
): Promise<OpenDialogCall[]> {
return await inspector.evalInMain<OpenDialogCall[]>(
`return globalThis.__claudeAiDialogCalls || []`,
);
}
// ----- shell.showItemInFolder ----------------------------------------
// Replace electron.shell.showItemInFolder with a mock that records
// every call without performing the underlying DBus FileManager1 /
// xdg-open dispatch. Same idempotency-flag pattern as
// installOpenDialogMock.
//
// Why mock vs. invoke real: `showItemInFolder` is fire-and-forget on
// Linux (returns void, no success signal). Invoking it for real opens
// the host's actual file manager — fine in a click-chain test, but
// disruptive when the assertion is just "the JS-level call is
// reachable + accepts a path arg + the IPC layer terminates here".
// The mock keeps the same assertion shape with no host side effect.
export async function installShowItemInFolderMock(
inspector: InspectorClient,
): Promise<void> {
await inspector.evalInMain<null>(`
if (globalThis.__claudeAiShowItemMockInstalled) return null;
const { shell } = process.mainModule.require('electron');
globalThis.__claudeAiShowItemCalls = [];
const original = shell.showItemInFolder.bind(shell);
shell.showItemInFolder = function(fullPath) {
globalThis.__claudeAiShowItemCalls.push({
ts: Date.now(),
path: typeof fullPath === 'string' ? fullPath : String(fullPath),
});
// Return undefined like the real method — callers don't
// inspect the return value.
};
void original;
globalThis.__claudeAiShowItemMockInstalled = true;
return null;
`);
}
export interface ShowItemInFolderCall {
ts: number;
path: string;
}
export async function getShowItemInFolderCalls(
inspector: InspectorClient,
): Promise<ShowItemInFolderCall[]> {
return await inspector.evalInMain<ShowItemInFolderCall[]>(
`return globalThis.__claudeAiShowItemCalls || []`,
);
}
// ----- shell.openExternal --------------------------------------------
// Replace electron.shell.openExternal with a mock that records every
// call without performing the underlying xdg-open / scheme-handler
// dispatch. Same idempotency-flag pattern as installOpenDialogMock /
// installShowItemInFolderMock.
//
// Why mock vs. invoke real: `shell.openExternal` is the single egress
// for all URL-scheme handoffs (browser, OAuth callback, editor URL
// schemes like `vscode://file/<path>`). Invoking it for real on a
// host with the matching scheme handler installed launches the target
// app (e.g. a full VS Code window) — fine in a click-chain test,
// disruptive when the assertion is just "the JS-level call is
// reachable + the URL flowed through verbatim". The mock keeps the
// same assertion shape with no host side effect.
//
// Unlike `showItemInFolder`, `openExternal` returns `Promise<boolean>`
// (true on success, false otherwise — see Electron docs), so the mock
// must return a resolved Promise with the canned boolean rather than
// undefined, otherwise callers that `await` the result would observe
// `undefined` instead of the documented contract.
export async function installOpenExternalMock(
inspector: InspectorClient,
cannedResult: boolean = true,
): Promise<void> {
const canned = JSON.stringify(cannedResult);
await inspector.evalInMain<null>(`
if (globalThis.__claudeAiOpenExternalMockInstalled) return null;
const { shell } = process.mainModule.require('electron');
globalThis.__claudeAiOpenExternalCalls = [];
const original = shell.openExternal.bind(shell);
shell.openExternal = async function(url, options) {
globalThis.__claudeAiOpenExternalCalls.push({
ts: Date.now(),
url: typeof url === 'string' ? url : String(url),
options: options,
});
// Return a resolved Promise<boolean> like the real method —
// callers that await the result expect the documented
// contract (true on success, false otherwise).
return ${canned};
};
void original;
globalThis.__claudeAiOpenExternalMockInstalled = true;
return null;
`);
}
export interface OpenExternalCall {
ts: number;
url: string;
options?: unknown;
}
export async function getOpenExternalCalls(
inspector: InspectorClient,
): Promise<OpenExternalCall[]> {
return await inspector.evalInMain<OpenExternalCall[]>(
`return globalThis.__claudeAiOpenExternalCalls || []`,
);
}

View File

@@ -0,0 +1,515 @@
import { spawn, execFile, type ChildProcess } from 'node:child_process';
import { existsSync, readlinkSync, rmSync } from 'node:fs';
import { homedir } from 'node:os';
import { dirname, join } from 'node:path';
import { promisify } from 'node:util';
import { sleep, retryUntil } from './retry.js';
import { findX11WindowByPid } from './wm.js';
import { InspectorClient } from './inspector.js';
import { createIsolation, type Isolation } from './isolation.js';
import { MainWindow, waitForUserLoaded } from './quickentry.js';
const exec = promisify(execFile);
export interface LaunchOptions {
extraEnv?: Record<string, string>;
args?: string[];
// Pass an existing Isolation to share config across multiple
// launches in one test (e.g. S35 position-memory across restart).
// Pass `null` to opt out of isolation entirely (legacy: shares
// ~/.config/Claude with the host). Default: a fresh isolation per
// launch, cleaned up on close().
isolation?: Isolation | null;
}
// Tiered readiness levels for waitForReady(). Higher levels include
// every check from lower levels. Pick the lowest level a test
// actually needs:
// - 'window' X11 window mapped (no inspector, no renderer state)
// - 'mainVisible' main shell BrowserWindow.isVisible() === true
// - 'claudeAi' any claude.ai webContents reachable (may be /login)
// - 'userLoaded' claude.ai URL past /login (lHn() precondition; the
// tightest gate before exercising QE submit paths)
export type ReadyLevel = 'window' | 'mainVisible' | 'claudeAi' | 'userLoaded';
export interface WaitForReadyOptions {
// Overall budget across all levels. Each step consumes from the
// remaining budget. Default 90_000ms covers the userLoaded path
// (~5-10s startup + main visible + 30s claude.ai load + login
// nav) with margin. Override down for cheaper levels.
timeout?: number;
}
export interface WindowReady {
wid: string;
}
export interface MainVisibleReady extends WindowReady {
inspector: InspectorClient;
}
export interface ClaudeAiReady extends MainVisibleReady {
// First claude.ai webContents URL observed. Absent if claude.ai
// never loaded within the budget — caller can treat as a skip
// (host likely not signed in).
claudeAiUrl?: string;
}
export interface UserLoadedReady extends ClaudeAiReady {
// claude.ai URL past /login. Absent if the renderer never
// navigated past the login page within the budget.
postLoginUrl?: string;
}
// Maps each level to the precise return shape its callers see.
// Conditional type rather than overloads because the implementation
// is a single closure with a union return — overloads would require
// either an unsafe cast or function-declaration overloads, both
// noisier than this.
export type ReadyResultFor<L extends ReadyLevel> =
L extends 'window' ? WindowReady :
L extends 'mainVisible' ? MainVisibleReady :
L extends 'claudeAi' ? ClaudeAiReady :
L extends 'userLoaded' ? UserLoadedReady :
never;
export interface ClaudeApp {
process: ChildProcess;
pid: number;
isolation: Isolation | null;
// Populated on close(). When the spawned Electron exits with
// non-zero `code` and was NOT killed by us (`signal === null`),
// this carries the data so a runner can `testInfo.attach()` the
// crash info without us coupling electron.ts to Playwright APIs
// or breaking the existing `await app.close()` sites that ignore
// the return value. Stays null while the proc is still running.
lastExitInfo: { code: number | null; signal: NodeJS.Signals | null } | null;
close(): Promise<void>;
waitForX11Window(timeoutMs?: number): Promise<string>;
attachInspector(timeoutMs?: number): Promise<InspectorClient>;
// Tiered "is the app ready for the kind of work this test does"
// helper. See ReadyLevel for what each level checks. Throws on
// timeout for 'window' / 'mainVisible' (hard-fail levels). For
// 'claudeAi' / 'userLoaded', returns with the corresponding field
// (claudeAiUrl, postLoginUrl) absent on timeout so callers can
// `testInfo.skip()` rather than fail when the host isn't signed in.
waitForReady<L extends ReadyLevel>(
level: L,
opts?: WaitForReadyOptions,
): Promise<ReadyResultFor<L>>;
}
// CDP auth gate: index.pre.js has
// uF(process.argv) && !qL() && process.exit(1);
// where uF matches --remote-debugging-port / --remote-debugging-pipe on argv
// and qL validates a token in CLAUDE_CDP_AUTH against a hardcoded ed25519
// public key (signed payload `${timestamp_ms}.${base64(userDataDir)}`,
// 5-minute TTL). Both Playwright's _electron.launch() and
// chromium.connectOverCDP() inject --remote-debugging-port=0 and trip the
// gate. Signing key is upstream's; we can't forge tokens.
//
// Workaround: the gate doesn't check --inspect or runtime SIGUSR1 (the
// "Developer → Enable Main Process Debugger" menu's code path). So we
// spawn without any debug-port flags (gate stays asleep), wait for the
// X11 window to appear, then send SIGUSR1 to attach the Node inspector at
// runtime. From there lib/inspector.ts gives us main-process JS eval,
// which reaches the renderer via webContents.executeJavaScript() and
// supports main-process mocks (e.g. dialog.showOpenDialog for T17).
// Default backend: X11 via XWayland. Mirrors launcher-common.sh's
// build_electron_args() X11 branch (the launcher itself isn't invoked
// because we spawn Electron directly to keep CLAUDE_CDP_AUTH out of
// the picture — see the SIGUSR1 attach comment above).
const LAUNCHER_INJECTED_FLAGS_X11 = [
'--disable-features=CustomTitlebar',
'--ozone-platform=x11',
'--no-sandbox',
];
// Native-Wayland backend, opted into by CLAUDE_HARNESS_USE_WAYLAND=1.
// Mirrors launcher-common.sh's Wayland branch (lines 132-135). Tests
// that need to drive the app under native Wayland (#226 follow-ups,
// future S07 sweep) flip the harness-level switch and every runner
// inherits this without per-spec changes.
const LAUNCHER_INJECTED_FLAGS_WAYLAND = [
'--disable-features=CustomTitlebar',
'--enable-features=UseOzonePlatform,WaylandWindowDecorations',
'--ozone-platform=wayland',
'--enable-wayland-ime',
'--wayland-text-input-version=3',
'--no-sandbox',
];
const LAUNCHER_INJECTED_ENV: Record<string, string> = {
ELECTRON_FORCE_IS_PACKAGED: 'true',
ELECTRON_USE_SYSTEM_TITLE_BAR: '1',
};
// Top-level opt-in: when CLAUDE_HARNESS_USE_WAYLAND=1, every
// launchClaude() call swaps the X11 flag set for the Wayland one and
// also exports CLAUDE_USE_WAYLAND=1 into the spawn env (so any in-app
// path that reads the launcher var stays consistent). Caller-supplied
// extraEnv still wins — a single test can override per-launch.
function harnessUseWayland(): boolean {
return process.env.CLAUDE_HARNESS_USE_WAYLAND === '1';
}
const DEFAULT_INSTALL_PATHS = [
{
electron: '/usr/lib/claude-desktop/node_modules/electron/dist/electron',
asar: '/usr/lib/claude-desktop/node_modules/electron/dist/resources/app.asar',
},
{
electron: '/opt/Claude/node_modules/electron/dist/electron',
asar: '/opt/Claude/node_modules/electron/dist/resources/app.asar',
},
];
interface AppPaths {
electron: string;
asar: string;
}
// Per-launch state needed by the SIGINT/SIGTERM cleanup. Tracks the
// child proc + isolation root so a Ctrl-C through Playwright doesn't
// leak Electron processes or the per-launch tmpdir. Stored separately
// from ClaudeApp so the signal handler doesn't reach into closure
// internals — `proc` and `root` are everything cleanup needs.
interface ActiveLaunch {
proc: ChildProcess;
// Isolation root to remove on signal. null when caller opted out
// (`isolation: null`) or supplied a shared handle (`ownsIsolation`
// false — that handle's lifetime is the test's, not ours).
root: string | null;
}
const activeLaunches = new Set<ActiveLaunch>();
let signalHandlersInstalled = false;
// Install once across every launch in the test process. Handler is
// synchronous: SIGKILL each spawned proc, rmSync each owned isolation
// root, then re-emit the signal so Playwright's own teardown still
// runs (and the process actually exits — without re-emit, Node would
// notice the handler swallowed the signal and stay alive).
//
// Only owns processes/dirs from this module, not anything Playwright
// itself spawned, so the cleanup is safe to run in parallel with
// Playwright's teardown.
function ensureSignalHandlers(): void {
if (signalHandlersInstalled) return;
signalHandlersInstalled = true;
const cleanup = (signal: NodeJS.Signals) => {
for (const launch of activeLaunches) {
try {
launch.proc.kill('SIGKILL');
} catch {
// proc may already be dead
}
if (launch.root) {
try {
rmSync(launch.root, { recursive: true, force: true });
} catch {
// best-effort — tmpdir cleanup is not load-bearing
}
}
}
activeLaunches.clear();
// Re-emit so default disposition runs. Removing our handler
// first prevents an infinite loop.
process.removeListener('SIGINT', sigintHandler);
process.removeListener('SIGTERM', sigtermHandler);
process.kill(process.pid, signal);
};
const sigintHandler = () => cleanup('SIGINT');
const sigtermHandler = () => cleanup('SIGTERM');
process.on('SIGINT', sigintHandler);
process.on('SIGTERM', sigtermHandler);
}
function resolveInstall(): AppPaths {
const envBin = process.env.CLAUDE_DESKTOP_ELECTRON;
const envAsar = process.env.CLAUDE_DESKTOP_APP_ASAR;
if (envBin && envAsar) return { electron: envBin, asar: envAsar };
for (const candidate of DEFAULT_INSTALL_PATHS) {
if (existsSync(candidate.electron) && existsSync(candidate.asar)) {
return candidate;
}
}
throw new Error(
'Could not locate claude-desktop install. Set CLAUDE_DESKTOP_ELECTRON ' +
'and CLAUDE_DESKTOP_APP_ASAR, or install the deb/rpm package.',
);
}
// Mirrors the pre-launch cleanup in launcher-common.sh (cleanup_orphaned_
// cowork_daemon + cleanup_stale_lock + cleanup_stale_cowork_socket).
//
// When `configDir` is provided (isolated test mode), the SingletonLock
// path is relative to that dir rather than ~/.config/Claude — the host
// config is left untouched.
export async function cleanupPreLaunch(configDir?: string): Promise<void> {
try {
await exec('pkill', ['-f', 'cowork-vm-service\\.js']);
} catch {
// pkill returns non-zero when no matches; that's fine.
}
const lockPath = configDir
? join(configDir, 'SingletonLock')
: join(homedir(), '.config/Claude/SingletonLock');
try {
const target = readlinkSync(lockPath);
const pidMatch = target.match(/-(\d+)$/);
if (pidMatch && !existsSync(`/proc/${pidMatch[1]}`)) {
rmSync(lockPath, { force: true });
}
} catch {
// Lock doesn't exist or isn't a symlink — both fine.
}
const sockPath = join(
process.env.XDG_RUNTIME_DIR ?? '/tmp',
'cowork-vm-service.sock',
);
if (existsSync(sockPath)) {
try {
rmSync(sockPath, { force: true });
} catch {
// Stale socket may already be gone.
}
}
}
export async function launchClaude(opts: LaunchOptions = {}): Promise<ClaudeApp> {
// Isolation default: create a fresh per-launch sandbox unless the
// caller passed `null` (legacy ~/.config/Claude) or supplied a
// pre-existing handle (shared across multiple launches in one test).
let isolation: Isolation | null;
let ownsIsolation = false;
if (opts.isolation === null) {
isolation = null;
} else if (opts.isolation) {
isolation = opts.isolation;
} else {
isolation = await createIsolation();
ownsIsolation = true;
}
await cleanupPreLaunch(isolation?.configDir);
const { electron: electronBin, asar } = resolveInstall();
const appDir = dirname(dirname(dirname(dirname(electronBin))));
const useWayland = harnessUseWayland();
const launcherFlags = useWayland
? LAUNCHER_INJECTED_FLAGS_WAYLAND
: LAUNCHER_INJECTED_FLAGS_X11;
// CLAUDE_USE_WAYLAND only when the harness-level gate is on.
// Spread BEFORE opts.extraEnv so a single test can override.
const waylandEnv: Record<string, string> = useWayland
? { CLAUDE_USE_WAYLAND: '1', GDK_BACKEND: 'wayland' }
: {};
const proc = spawn(
electronBin,
[...launcherFlags, asar, ...(opts.args ?? [])],
{
cwd: appDir,
env: {
...process.env,
...LAUNCHER_INJECTED_ENV,
...(isolation?.env ?? {}),
...waylandEnv,
...opts.extraEnv,
CI: '1',
} as Record<string, string>,
stdio: 'ignore',
detached: false,
},
);
if (!proc.pid) {
if (ownsIsolation && isolation) await isolation.cleanup();
throw new Error('Failed to spawn Electron — no pid');
}
// Register signal handlers + add this launch to the active set so a
// Ctrl-C through Playwright SIGKILLs the Electron child and (if we
// own the tmpdir) rmSync's the isolation root. Owned-isolation
// signal cleanup uses dirname(configHome) — Isolation doesn't
// expose `root`, but createIsolation builds configHome as
// `<root>/config`, so the parent dir is the tmpdir to remove.
ensureSignalHandlers();
const isolationRoot =
ownsIsolation && isolation ? dirname(isolation.configHome) : null;
const launchEntry: ActiveLaunch = { proc, root: isolationRoot };
activeLaunches.add(launchEntry);
// Single-slot inspector tracking. Only one inspector ever attaches
// per launch (SIGUSR1 opens port 9229; reusing the port across
// re-attaches isn't supported). Stored so close() can release the
// WebSocket even if the runner forgets — previously every runner
// did `inspector.close(); finally app.close();` and the WS leaked
// when an `expect()` between those threw.
let trackedInspector: InspectorClient | null = null;
const waitForX11Window = async (timeoutMs = 15_000): Promise<string> => {
const wid = await retryUntil(
async () => findX11WindowByPid(proc.pid!),
{ timeout: timeoutMs, interval: 250 },
);
if (!wid) {
throw new Error(
`X11 window for pid ${proc.pid} did not appear within ${timeoutMs}ms`,
);
}
return wid;
};
const attachInspector = async (timeoutMs = 15_000): Promise<InspectorClient> => {
// Send SIGUSR1 to open the Node inspector at runtime — same code
// path as Developer → Enable Main Process Debugger menu item.
// Then poll http://127.0.0.1:9229/json/list until it answers.
process.kill(proc.pid!, 'SIGUSR1');
const start = Date.now();
let lastErr: unknown = null;
while (Date.now() - start < timeoutMs) {
try {
const client = await InspectorClient.connect(9229);
trackedInspector = client;
return client;
} catch (err) {
lastErr = err;
await sleep(250);
}
}
throw new Error(
`Inspector did not become ready on port 9229 within ${timeoutMs}ms: ${
lastErr instanceof Error ? lastErr.message : String(lastErr)
}`,
);
};
const waitForReady = async (
level: ReadyLevel,
opts: WaitForReadyOptions = {},
): Promise<WindowReady | MainVisibleReady | ClaudeAiReady | UserLoadedReady> => {
const overall = opts.timeout ?? 90_000;
const start = Date.now();
// Each step uses the remaining overall budget rather than
// a fixed per-step timeout. If startup is slow, downstream
// steps still get whatever's left; if startup is fast, the
// later steps inherit the unused margin.
const remaining = () => Math.max(0, overall - (Date.now() - start));
const wid = await waitForX11Window(remaining());
if (level === 'window') return { wid };
const inspector = await attachInspector(remaining());
// 'mainVisible' — the main shell BrowserWindow has been
// shown. MainWindow.getState() resolves the window via
// claude.ai webContents, so this poll implicitly also
// requires that webContents to exist; the explicit
// 'claudeAi' step below is for the URL-list signal that
// some tests want even when window visibility is incidental.
const mainWin = new MainWindow(inspector);
const visibleState = await retryUntil(
async () => {
const s = await mainWin.getState();
return s && s.visible ? s : null;
},
{ timeout: remaining(), interval: 250 },
);
if (!visibleState) {
throw new Error(
`waitForReady('${level}'): main window did not become ` +
`visible within ${overall}ms`,
);
}
if (level === 'mainVisible') return { wid, inspector };
// 'claudeAi' — a claude.ai-domain webContents exists in
// the registry. May still be on /login. Soft-fails on
// timeout: returns without claudeAiUrl so the caller
// can skip (host likely not signed in).
const claudeAiUrl = await retryUntil(
async () => {
const all = await inspector.evalInMain<{ url: string }[]>(`
const { webContents } = process.mainModule.require('electron');
return webContents.getAllWebContents().map(w => ({ url: w.getURL() }));
`);
return all.find((w) => w.url.includes('claude.ai'))?.url ?? null;
},
{ timeout: remaining(), interval: 500 },
);
if (!claudeAiUrl) {
return { wid, inspector };
}
if (level === 'claudeAi') return { wid, inspector, claudeAiUrl };
// 'userLoaded' — URL past /login. Necessary precondition
// for upstream's lHn() (`!user.isLoggedOut`) returning
// true, which gates Ko.show() in the shortcut handler.
// NOT sufficient on its own — main-process user state
// loads on a separate timeline from the renderer URL,
// so QE submit paths still need openAndWaitReady's
// retry loop on top of this.
const postLoginUrl =
(await waitForUserLoaded(inspector, remaining())) ?? undefined;
return { wid, inspector, claudeAiUrl, postLoginUrl };
};
const app: ClaudeApp = {
process: proc,
pid: proc.pid,
isolation,
lastExitInfo: null,
async close() {
// Drop the inspector first — InspectorClient.close() is now
// idempotent (see lib/inspector.ts) so the runner-side
// `inspector.close()` calls keep working even when this
// fires too. Wrapped in try/catch because a thrown ws.close
// shouldn't block the proc/iso cleanup below.
if (trackedInspector) {
try {
trackedInspector.close();
} catch {
// already closed
}
trackedInspector = null;
}
if (proc.exitCode === null && proc.signalCode === null) {
proc.kill('SIGTERM');
await Promise.race([
new Promise<void>((resolve) => proc.once('exit', () => resolve())),
sleep(5000),
]);
if (proc.exitCode === null && proc.signalCode === null) {
proc.kill('SIGKILL');
}
}
// Capture exit info BEFORE iso cleanup. Runners can attach
// app.lastExitInfo to testInfo when non-null + signal === null
// (we didn't kill it, so a non-zero code means a real crash).
app.lastExitInfo = {
code: proc.exitCode,
signal: proc.signalCode,
};
activeLaunches.delete(launchEntry);
if (ownsIsolation && isolation) {
await isolation.cleanup();
}
},
waitForX11Window,
attachInspector,
// TS can't verify a closure with a union return matches the
// generic conditional signature, even though the runtime
// branches do produce the right shape per level. The cast
// preserves the public contract.
waitForReady: waitForReady as ClaudeApp['waitForReady'],
};
return app;
}

View File

@@ -0,0 +1,30 @@
export interface DesktopEnv {
desktop: string;
sessionType: string;
isWayland: boolean;
isX11: boolean;
isKDE: boolean;
isGNOME: boolean;
isSWAY: boolean;
isHYPR: boolean;
isNIRI: boolean;
row: string;
}
export function getEnv(): DesktopEnv {
const desktop = process.env.XDG_CURRENT_DESKTOP ?? '';
const sessionType = process.env.XDG_SESSION_TYPE ?? '';
const upper = desktop.toUpperCase();
return {
desktop,
sessionType,
isWayland: sessionType === 'wayland',
isX11: sessionType === 'x11',
isKDE: upper.includes('KDE'),
isGNOME: upper.includes('GNOME'),
isSWAY: upper.includes('SWAY'),
isHYPR: upper.includes('HYPRLAND'),
isNIRI: upper.includes('NIRI'),
row: process.env.ROW ?? 'KDE-W',
};
}

View File

@@ -0,0 +1,111 @@
// Detect-and-kill any running Claude Desktop process owned by the
// current user. Used before seeding a hermetic isolation from the
// host config, because Cookies (SQLite) and Local Storage / IndexedDB
// (LevelDB) all hold writer locks while the host app is running — a
// naive cp would either copy a torn page or fail outright on the
// LevelDB LOCK file.
//
// SIGTERM first, wait up to 5s for graceful exit, SIGKILL survivors.
// Loud stderr output: the user needs to know we're force-quitting
// their app so they can blame us, not Claude Desktop, when their
// unsaved chat draft disappears.
import { execFile } from 'node:child_process';
import { promisify } from 'node:util';
import { sleep } from './retry.js';
const exec = promisify(execFile);
// Patterns that match host installs (deb, rpm, AppImage, dev tree).
// argv-based via `pgrep -f`: matches the installed binary path or
// the mounted AppImage path. The harness's own launches always set
// XDG_CONFIG_HOME to a tmpdir, so they wouldn't be confused with
// the host even if the patterns overlapped — but kill runs BEFORE
// our launch, so at this moment there's nothing of ours to confuse.
const HOST_PROCESS_PATTERNS = [
'/usr/lib/claude-desktop/',
'/opt/Claude/',
'\\.mount_[Cc]laude',
'/usr/bin/claude-desktop',
];
// Per-pid graceful-exit budget. Electron flushes LevelDB + checkpoints
// the SQLite WAL on SIGTERM; 5s covers a typical shutdown with margin.
const SIGTERM_GRACE_MS = 5_000;
const POLL_INTERVAL_MS = 200;
interface HostProcess {
pid: number;
argv: string;
}
async function findHostProcesses(): Promise<HostProcess[]> {
const pattern = HOST_PROCESS_PATTERNS.join('|');
try {
const { stdout } = await exec('pgrep', ['-af', pattern]);
return stdout
.split('\n')
.filter(Boolean)
.map((line) => {
const space = line.indexOf(' ');
const pid = Number(space === -1 ? line : line.slice(0, space));
const argv = space === -1 ? '' : line.slice(space + 1);
return { pid, argv };
})
.filter((p) => Number.isFinite(p.pid) && p.pid !== process.pid);
} catch {
// pgrep returns 1 when nothing matches — happy path.
return [];
}
}
function isAlive(pid: number): boolean {
try {
// Signal 0: existence check, no signal delivered.
process.kill(pid, 0);
return true;
} catch {
return false;
}
}
export async function killHostClaude(): Promise<void> {
const procs = await findHostProcesses();
if (procs.length === 0) return;
process.stderr.write(
`host-claude: ${procs.length} running Claude process(es) found; ` +
'sending SIGTERM (auth-state seed needs writer-lock release):\n',
);
for (const { pid, argv } of procs) {
process.stderr.write(` pid=${pid} ${argv.slice(0, 120)}\n`);
try {
process.kill(pid, 'SIGTERM');
} catch {
// Race: already exited between pgrep and now.
}
}
const deadline = Date.now() + SIGTERM_GRACE_MS;
while (Date.now() < deadline) {
if (!procs.some((p) => isAlive(p.pid))) return;
await sleep(POLL_INTERVAL_MS);
}
const survivors = procs.filter((p) => isAlive(p.pid));
if (survivors.length === 0) return;
process.stderr.write(
`host-claude: ${survivors.length} survived SIGTERM; sending SIGKILL:\n`,
);
for (const { pid } of survivors) {
process.stderr.write(` pid=${pid}\n`);
try {
process.kill(pid, 'SIGKILL');
} catch {
// Race: already exited.
}
}
// Final beat so /proc entries clear before the seed copy starts.
await sleep(POLL_INTERVAL_MS);
}

View File

@@ -0,0 +1,393 @@
// Focus-shifter primitive for "Quick Entry shortcut fires from any
// focus" (S14) on Niri sessions — the Wayland-native sibling of
// lib/input.ts. The runner needs to (a) spawn a sacrificial window
// with a known title, (b) shove keyboard focus to it, then (c) press
// the global shortcut and observe whether the QE popup appears
// regardless of focus.
//
// Niri only — by design.
// - There is no portable focus-injection on native Wayland. Each
// compositor exposes a different IPC: niri msg here, swaymsg for
// Sway, hyprctl for Hyprland, riverctl for River. The libei-based
// "input emulation" portal is the long-term cross-compositor
// answer but isn't widely deployed (KDE/GNOME are getting it,
// niri/sway/hypr are not yet). We pay one file per compositor
// until a second consumer surfaces the dispatcher need; a
// hypothetical lib/input-wayland.ts would just switch on
// XDG_CURRENT_DESKTOP and delegate. With only S14 consuming this,
// a dispatcher would be ceremony.
// - lib/input.ts (X11) and this file are independent: they don't
// share a focus-id type — niri window IDs are u64 numerics, X11
// WIDs are hex strings. Callers handle one or the other based on
// session detection; nothing crosses the boundary.
//
// Why niri msg --json over plain text: the niri wiki explicitly
// contracts the JSON output as stable while the plain-text form is
// described as unstable / human-readable-only. A test harness that
// regex-greps human-readable IPC output is one niri release away
// from a quiet break.
//
// Why we verify post-focus via niri msg focused-window: niri msg
// action focus-window exits 0 even when the focus didn't actually
// land (the action queues into the compositor and a competing input
// event or a closing window can race it). The only honest answer is
// to read focused-window back out and compare IDs. This mirrors
// lib/input.ts's xprop-readback paragraph but for niri's IPC. ~3s
// budget covers slow compositor paths; anything beyond is a refusal
// not a slow ack — surface as an error so S14 sees it.
//
// Why foot for the marker terminal: it's the niri-default in many
// distros (Fedora niri spin, several Arch derivatives), accepts
// --title <T> verbatim with no de-escaping surprises, and ships in
// most niri setups so a single binary covers the common case. We
// deliberately don't fall back to alacritty / kitty — the X11
// primitive uses xterm-only and the simplicity is worth more than
// the marginal robustness; an environment without foot can install
// it the same way an X11 environment without xterm installs xterm.
//
// Why detached:false on the marker spawn: keep the foot child in the
// parent's process group so the OS cleans it up if the test crashes.
// (Session 5 recon sketched detached:true; lib/input.ts uses
// detached:false and is the safer pattern — a leaked terminal past a
// crashed test run is worse than a marker that dies cleanly with its
// parent.)
//
// No fixed sleeps. The verification poll uses retryUntil so a fast
// compositor finishes in ~50ms while a slow one gets the full budget.
import { execFile } from 'node:child_process';
import { promisify } from 'node:util';
import { retryUntil } from './retry.js';
const exec = promisify(execFile);
// Caller catches this and calls test.skip() — it's an environment
// gap (not a Niri session, or niri msg not on PATH), not a
// regression. Subclassing Error gives consumers a clean
// `instanceof` check without parsing message strings.
export class NiriIpcUnavailable extends Error {
constructor(message?: string) {
super(
message ??
'niri msg IPC unavailable: either this is not a Niri ' +
'session (XDG_CURRENT_DESKTOP !== "niri") or the ' +
'`niri` binary is missing from PATH. Install the ' +
'`niri-ipc` / `niri` package, or skip on this row.',
);
this.name = 'NiriIpcUnavailable';
}
}
// Mirrors lib/input.ts's XdotoolUnavailable — the install command is
// the actually-useful part of the error. Consumers should usually
// skip rather than fail; the absence of foot is an environment
// configuration issue, not a Claude Desktop regression.
export class FootUnavailable extends Error {
constructor(message?: string) {
super(
message ??
'foot binary not found on PATH. Install with ' +
'`dnf install foot` / `apt install foot`.',
);
this.name = 'FootUnavailable';
}
}
// Single source of truth for the Niri / not-Niri branch. Pure env
// check, no process spawn — matches the simplicity of isX11Session()
// in lib/input.ts. A `niri msg version` probe would be more
// authoritative (catches the case where someone manually overrides
// XDG_CURRENT_DESKTOP) but adds a fork-per-call cost that's
// disproportionate to how rare the override is in practice.
//
// The literal string 'niri' is the value niri itself sets in
// XDG_CURRENT_DESKTOP per its own documentation; we trust that and
// nothing else (no case-folding, no startswith).
export function isNiriSession(): boolean {
return process.env.XDG_CURRENT_DESKTOP === 'niri';
}
// Niri's --json output for several IPC calls is wrapped in a
// Result-style envelope: `{"Ok": <payload>}`. Newer/older niri
// versions sometimes return the bare payload. Defensively unwrap one
// layer of `.Ok` if present, then return the payload as-is. Returns
// null if the input is null/undefined.
function unwrapOk(value: unknown): unknown {
if (value === null || value === undefined) return null;
if (typeof value === 'object' && value !== null && 'Ok' in value) {
return (value as { Ok: unknown }).Ok;
}
return value;
}
// Shape of a niri window row, restricted to the fields we use. The
// real schema has more (workspace_id, is_floating, etc.) — we don't
// commit to those.
interface NiriWindow {
id: number;
title: string | null;
app_id: string | null;
is_focused?: boolean;
}
// Read the currently-focused niri window via `niri msg --json
// focused-window`.
//
// Returns null on:
// - Non-Niri session (gated out by isNiriSession()).
// - niri binary missing / spawn ENOENT — analogous to lib/input.ts
// returning null on xprop spawn failure rather than throwing.
// focusOtherWindow's poll fails through to its own timeout.
// - JSON parse failure or unexpected shape (defensive — should
// not happen against a healthy niri but the cost of a null
// return is one re-poll).
// - No focused window (e.g. all workspaces empty).
export async function getFocusedWindowId(): Promise<number | null> {
if (!isNiriSession()) return null;
let stdout: string;
try {
({ stdout } = await exec('niri', [
'msg',
'--json',
'focused-window',
]));
} catch {
return null;
}
const trimmed = stdout.trim();
if (!trimmed) return null;
let parsed: unknown;
try {
parsed = JSON.parse(trimmed);
} catch {
return null;
}
// Two known wrappings: `{Ok: {FocusedWindow: <window>}}` (older)
// and the bare window object (newer). Try unwrapping in order.
const okUnwrapped = unwrapOk(parsed);
let candidate: unknown = okUnwrapped;
if (
typeof okUnwrapped === 'object' &&
okUnwrapped !== null &&
'FocusedWindow' in okUnwrapped
) {
candidate = (okUnwrapped as { FocusedWindow: unknown }).FocusedWindow;
}
if (
typeof candidate !== 'object' ||
candidate === null ||
!('id' in candidate)
) {
return null;
}
const id = (candidate as { id: unknown }).id;
if (typeof id !== 'number' || !Number.isFinite(id)) return null;
return id;
}
// Resolve a window title to its niri ID via `niri msg --json
// windows`. The list is `Vec<Window>`; we filter on title match AND
// app_id !== 'Claude' so we never accidentally pick the test target
// itself. Returns null on zero matches; returns the first match's
// ID on multi-match (mirrors xdotool's first-match behavior in
// lib/input.ts).
async function resolveWindowIdByTitle(
title: string,
): Promise<number | null> {
const { stdout } = await exec('niri', ['msg', '--json', 'windows']);
const trimmed = stdout.trim();
if (!trimmed) return null;
let parsed: unknown;
try {
parsed = JSON.parse(trimmed);
} catch {
return null;
}
// Same Ok-wrapping defense as getFocusedWindowId.
const unwrapped = unwrapOk(parsed);
if (!Array.isArray(unwrapped)) return null;
for (const row of unwrapped as NiriWindow[]) {
if (
row &&
typeof row === 'object' &&
typeof row.id === 'number' &&
row.title === title &&
row.app_id !== 'Claude'
) {
return row.id;
}
}
return null;
}
// Shift Niri focus to the first window whose title matches `title`
// and whose app_id is not 'Claude' (so we never target Claude's own
// window), then verify the shift actually took.
//
// Throws:
// - NiriIpcUnavailable when not a Niri session, or niri binary
// missing.
// - Plain Error when no window matches (caller's bug — forgot to
// spawn the marker, or used the wrong title).
// - Plain Error when niri msg returns 0 but focused-window never
// reflects the focus change within ~3s (compositor refused the
// activation; this is the diagnostic path S14 wants surfaced,
// not swallowed).
export async function focusOtherWindow(title: string): Promise<void> {
if (!isNiriSession()) {
throw new NiriIpcUnavailable();
}
let targetId: number | null;
try {
targetId = await resolveWindowIdByTitle(title);
} catch (err) {
const e = err as { code?: string | number };
if (e.code === 'ENOENT') throw new NiriIpcUnavailable();
throw err;
}
if (targetId === null) {
throw new Error(
`focusOtherWindow: no Niri window matches title ${JSON.stringify(title)} ` +
'(with app_id != "Claude"). Did the marker window finish ' +
'mapping? Caller should await spawnMarkerWindow + a short ' +
'readiness poll before calling focusOtherWindow.',
);
}
try {
await exec('niri', [
'msg',
'action',
'focus-window',
'--id',
String(targetId),
]);
} catch (err) {
const e = err as { code?: string | number };
if (e.code === 'ENOENT') throw new NiriIpcUnavailable();
throw err;
}
const matched = await retryUntil(
async () => {
const active = await getFocusedWindowId();
return active === targetId ? true : null;
},
{ timeout: 3_000, interval: 100 },
);
if (!matched) {
throw new Error(
'focusOtherWindow: niri msg action focus-window returned 0 ' +
`but focused-window never settled to id=${targetId} ` +
`for title ${JSON.stringify(title)}. Compositor may have ` +
'refused the activation request.',
);
}
}
// Handle returned from spawnMarkerWindow. Lifecycle is owned by the
// caller — the test that spawned it must kill() in afterEach (or
// equivalent), otherwise the foot terminal leaks past the test run.
export interface MarkerWindow {
pid: number;
title: string;
kill(): Promise<void>;
}
// Spawn a long-lived foot terminal with a known title, suitable as
// a focus target on a Niri session. Backgrounded with detached:false
// so the parent test process owns its lifetime — if the test
// crashes, the OS cleans up the child when the parent dies.
//
// Throws FootUnavailable if foot isn't on PATH (both at spawn-throw
// time AND via the 'error' event, mirroring lib/input.ts's redundant
// ENOENT handling — Node delivers ENOENT through different paths
// across versions).
export async function spawnMarkerWindow(
title: string,
): Promise<MarkerWindow> {
const { spawn } = await import('node:child_process');
let child;
try {
// `sleep 600` keeps the foot terminal alive for 10min — longer
// than any reasonable single test, short enough that a leaked
// terminal self-cleans within the sweep. foot's --title sets
// the window title field that niri's windows list reports.
child = spawn('foot', ['--title', title, '-e', 'sleep', '600'], {
detached: false,
stdio: 'ignore',
});
} catch (err) {
const e = err as { code?: string | number };
if (e.code === 'ENOENT') {
throw new FootUnavailable();
}
throw err;
}
const earlyError = await new Promise<Error | null>((resolve) => {
const onError = (err: Error) => {
child.removeListener('spawn', onSpawn);
resolve(err);
};
const onSpawn = () => {
child.removeListener('error', onError);
resolve(null);
};
child.once('error', onError);
child.once('spawn', onSpawn);
});
if (earlyError) {
const e = earlyError as Error & { code?: string | number };
if (e.code === 'ENOENT') {
throw new FootUnavailable();
}
throw earlyError;
}
const pid = child.pid;
if (typeof pid !== 'number') {
throw new Error(
'spawnMarkerWindow: child.pid was undefined after spawn',
);
}
let killed = false;
const kill = async (): Promise<void> => {
if (killed) return;
killed = true;
if (child.exitCode !== null || child.signalCode !== null) {
return;
}
// SIGTERM with a short grace period before SIGKILL. foot
// honors SIGTERM cleanly; the SIGKILL fallback is for the
// pathological "child wedged in a syscall" case.
const exited = new Promise<void>((resolve) => {
child.once('exit', () => resolve());
});
try {
child.kill('SIGTERM');
} catch {
// Process may have died between the check and the kill.
}
const graceMs = 500;
const timedOut = await Promise.race([
exited.then(() => false),
new Promise<boolean>((resolve) =>
setTimeout(() => resolve(true), graceMs),
),
]);
if (timedOut) {
try {
child.kill('SIGKILL');
} catch {
// Already dead.
}
await exited;
}
};
return { pid, title, kill };
}

View File

@@ -0,0 +1,346 @@
// Focus-shifter primitive for "Quick Entry shortcut fires from any focus"
// (S11, S14). The runner needs to (a) spawn a sacrificial window with
// a known title, (b) shove keyboard focus to it, then (c) press the
// global shortcut and observe whether the QE popup appears regardless
// of focus.
//
// X11 only — by design.
// - There is no portable focus-injection on native Wayland. Each
// compositor exposes its own IPC (swaymsg, kitten, hyprctl,
// niri msg) and the libei-based "input emulation" portal isn't
// universally honored. Rather than bake a per-compositor matrix
// into the harness, runners on native Wayland rows must skip
// this test entirely. WaylandFocusUnavailable is the signal.
// - Wayland-with-XWayland (KDE-W default, Ubu-W default, GNOME-W
// when XDG_SESSION_TYPE=x11 is forced) is *not* an X11 session
// for our purposes — the WAYLAND-SIDE windows xdotool can't see
// are exactly the windows S11/S14 care about. The single source
// of truth is XDG_SESSION_TYPE === 'x11'. Anything else: skip.
//
// Why xdotool over xprop+wmctrl-equivalent: xdotool ships
// `search --name <regex> windowfocus` as one atomic call. Doing it
// with raw xprop means walking _NET_CLIENT_LIST, fetching _NET_WM_NAME
// per WID, picking a match, then sending an _NET_ACTIVE_WINDOW
// ClientMessage — which xprop can't generate, only read. wmctrl can,
// but adds a second binary dependency for no win.
//
// Why we verify post-focus via xprop: xdotool exits 0 even when
// focus didn't actually shift. Some compositors (mutter under
// XWayland-forced mode notably) accept the WM_TAKE_FOCUS / SetInputFocus
// pair and then quietly refuse the activation. The only honest
// answer is to read _NET_ACTIVE_WINDOW back out and compare WIDs.
// xdotool prints decimal WIDs; xprop prints `0x...` hex. We
// normalize to lowercase 0x-prefixed hex with leading zeros stripped.
//
// No fixed sleeps. The verification poll uses retryUntil so a fast
// compositor finishes in ~50ms while a slow one gets the full budget.
import { execFile } from 'node:child_process';
import { promisify } from 'node:util';
import { retryUntil } from './retry.js';
const exec = promisify(execFile);
// Caller catches this and calls test.skip() — it's an environment gap,
// not a regression. Subclassing Error gives consumers a clean
// `instanceof` check without parsing message strings.
export class WaylandFocusUnavailable extends Error {
constructor(message?: string) {
super(
message ??
'focusOtherWindow: native Wayland session — no portable ' +
'focus-injection path. Skip on this row.',
);
this.name = 'WaylandFocusUnavailable';
}
}
// Mirrors quickentry.ts's ensureYdotool message style — the install
// command is the actually-useful part of the error. Consumers should
// usually skip rather than fail; the absence of xdotool is an
// environment configuration issue, not a Claude Desktop regression.
export class XdotoolUnavailable extends Error {
constructor(message?: string) {
super(
message ??
'xdotool binary not found on PATH. Install with ' +
'`dnf install xdotool` / `apt install xdotool`.',
);
this.name = 'XdotoolUnavailable';
}
}
// Single source of truth for the X11/Wayland branch. Every other
// function in this file calls this — do not duplicate the env check.
//
// XDG_SESSION_TYPE is set by logind. Possible values per spec are
// `x11`, `wayland`, `tty`, `mir`, `unspecified`. We only trust the
// literal string `x11` — anything else, including missing, returns
// false. That means an unset env var on a real X11 box returns false
// here; that's the correct conservative default since we can't
// verify the assumption.
export function isX11Session(): boolean {
return process.env.XDG_SESSION_TYPE === 'x11';
}
// Normalize a WID to lowercase 0x-prefixed hex with leading zeros
// stripped after the prefix. Accepts decimal (xdotool stdout) or hex
// (xprop stdout, with or without 0x). Returns null on parse failure.
//
// Examples:
// '94371842' → '0x5a00002'
// '0x05a00002' → '0x5a00002'
// '0X5A00002' → '0x5a00002'
function normalizeWid(raw: string): string | null {
const s = raw.trim();
if (!s) return null;
const isHex = /^0x/i.test(s);
const n = isHex ? parseInt(s, 16) : parseInt(s, 10);
if (!Number.isFinite(n) || n <= 0) return null;
return '0x' + n.toString(16);
}
// Read the currently-focused X11 window via _NET_ACTIVE_WINDOW.
//
// Returns null on:
// - Native Wayland (xprop may still respond via XWayland but the
// value is meaningless for native-Wayland clients — they don't
// appear in the X11 active-window list at all). Returning null
// here lets focusOtherWindow's poll fail through to its own
// timeout, but in practice native-Wayland rows are gated out
// earlier by isX11Session().
// - xprop missing / spawn failure.
// - Output that doesn't match the documented format (defensive —
// this should never happen on a real EWMH-compliant WM but the
// cost of a null return is one re-poll).
export async function getFocusedWindowId(): Promise<string | null> {
if (!isX11Session()) return null;
let stdout: string;
try {
({ stdout } = await exec('xprop', [
'-root',
'_NET_ACTIVE_WINDOW',
]));
} catch {
return null;
}
// Documented format:
// _NET_ACTIVE_WINDOW(WINDOW): window id # 0x5a00002
const m = stdout.match(/window id #\s*(0x[0-9a-fA-F]+)/);
if (!m || !m[1]) return null;
return normalizeWid(m[1]);
}
// Resolve a window title to its WID via xdotool. xdotool prints one
// decimal WID per matching line — we take the first (and warn via
// thrown Error if there are zero matches; multi-match is silently
// resolved to the first, mirroring xdotool's own windowfocus
// behavior).
async function resolveWindowIdByTitle(
title: string,
): Promise<string | null> {
const { stdout } = await exec('xdotool', ['search', '--name', title]);
const lines = stdout
.split('\n')
.map((l) => l.trim())
.filter(Boolean);
if (lines.length === 0) return null;
const first = lines[0];
if (!first) return null;
return normalizeWid(first);
}
// Shift X11 focus to the first window whose title matches `title`,
// then verify the shift actually took.
//
// Throws:
// - WaylandFocusUnavailable on native Wayland.
// - XdotoolUnavailable when xdotool isn't on PATH.
// - Plain Error when no window matches the title (caller's bug —
// forgot to spawn the marker, or used the wrong title).
// - Plain Error when xdotool reports success but xprop never
// reflects the focus change within ~3s (compositor refused the
// activation; this is the diagnostic path S11/S14 actually want
// to surface, not swallow).
export async function focusOtherWindow(title: string): Promise<void> {
if (!isX11Session()) {
throw new WaylandFocusUnavailable();
}
// Resolve target WID first so we know what to verify against.
// Combining this with `windowfocus` would save a roundtrip but
// would also make the post-focus comparison impossible.
let targetWid: string | null;
try {
targetWid = await resolveWindowIdByTitle(title);
} catch (err) {
const e = err as { code?: string | number };
if (e.code === 'ENOENT') throw new XdotoolUnavailable();
throw err;
}
if (!targetWid) {
throw new Error(
`focusOtherWindow: no X11 window matches title ${JSON.stringify(title)}. ` +
'Did the marker window finish mapping? Caller should ' +
'await spawnMarkerWindow + a short readiness poll before ' +
'calling focusOtherWindow.',
);
}
// Send the focus request. xdotool's windowfocus issues a
// SetInputFocus, which is best-effort; the verify-via-xprop
// step below is the actual assertion.
try {
await exec('xdotool', ['search', '--name', title, 'windowfocus']);
} catch (err) {
const e = err as { code?: string | number };
if (e.code === 'ENOENT') throw new XdotoolUnavailable();
throw err;
}
// Poll _NET_ACTIVE_WINDOW until it matches the target. ~3s budget
// covers slow compositor activation paths (mutter cold-path is
// the worst observed, ~800ms). Anything beyond 3s is a refusal,
// not a slow ack — surface as an error so S11/S14 see it.
const matched = await retryUntil(
async () => {
const active = await getFocusedWindowId();
return active === targetWid ? true : null;
},
{ timeout: 3_000, interval: 100 },
);
if (!matched) {
throw new Error(
`focusOtherWindow: xdotool windowfocus returned 0 but ` +
`_NET_ACTIVE_WINDOW never settled to ${targetWid} ` +
`for title ${JSON.stringify(title)}. Compositor may ` +
'have refused the activation request.',
);
}
}
// Handle returned from spawnMarkerWindow. Lifecycle is owned by the
// caller — the test that spawned it must kill() in afterEach (or
// equivalent), otherwise the xterm leaks past the test run.
export interface MarkerWindow {
pid: number;
title: string;
kill(): Promise<void>;
}
// Spawn a long-lived xterm with a known title, suitable as a focus
// target. Backgrounded with detached:false so the parent test process
// owns its lifetime — if the test crashes, the OS cleans up the child
// when the parent dies.
//
// Why xterm: it's the lowest-common-denominator X11 terminal — every
// X11 row has it (or can install it via the standard package). It
// honors -title verbatim (no de-escaping surprises) and -e accepts
// a single command without argv parsing quirks. Alternatives like
// `xclock` / `xeyes` either don't accept arbitrary titles or are
// missing on minimal Fedora installs.
//
// Throws if xterm isn't on PATH. Caller's responsibility to fall
// back or skip; we don't carry an `XtermUnavailable` class because
// the consumer decision tree is identical to "skip on missing
// xdotool" and the message is self-explanatory.
export async function spawnMarkerWindow(
title: string,
): Promise<MarkerWindow> {
// Lazy import so the module loads cleanly on Wayland rows that
// never call this function. (Top-level imports of node:child_process
// are already paid for by execFile, so this is mostly stylistic.)
const { spawn } = await import('node:child_process');
let child;
try {
// `sleep 600` keeps the xterm alive for 10min — longer than
// any reasonable single test, short enough that a leaked
// xterm self-cleans within the sweep. -hold not used: we
// want the window to die when sleep dies.
child = spawn('xterm', ['-title', title, '-e', 'sleep', '600'], {
detached: false,
stdio: 'ignore',
});
} catch (err) {
const e = err as { code?: string | number };
if (e.code === 'ENOENT') {
throw new Error(
'xterm binary not found on PATH. Install with ' +
'`dnf install xterm` / `apt install xterm`. ' +
'Required by the focus-shift test path; consumers ' +
'should skip when this throws.',
);
}
throw err;
}
// Surface synchronous spawn failures (ENOENT on some Node
// versions arrives via the 'error' event, not the throw above).
const earlyError = await new Promise<Error | null>((resolve) => {
const onError = (err: Error) => {
child.removeListener('spawn', onSpawn);
resolve(err);
};
const onSpawn = () => {
child.removeListener('error', onError);
resolve(null);
};
child.once('error', onError);
child.once('spawn', onSpawn);
});
if (earlyError) {
const e = earlyError as Error & { code?: string | number };
if (e.code === 'ENOENT') {
throw new Error(
'xterm binary not found on PATH. Install with ' +
'`dnf install xterm` / `apt install xterm`.',
);
}
throw earlyError;
}
const pid = child.pid;
if (typeof pid !== 'number') {
// Shouldn't happen after a successful 'spawn' event, but
// the type system doesn't know that.
throw new Error('spawnMarkerWindow: child.pid was undefined after spawn');
}
let killed = false;
const kill = async (): Promise<void> => {
if (killed) return;
killed = true;
if (child.exitCode !== null || child.signalCode !== null) {
return; // already exited
}
// SIGTERM with a short grace period before SIGKILL. xterm
// honors SIGTERM cleanly; the SIGKILL fallback is for the
// pathological "child wedged in a syscall" case.
const exited = new Promise<void>((resolve) => {
child.once('exit', () => resolve());
});
try {
child.kill('SIGTERM');
} catch {
// Process may have died between the check and the kill.
}
const graceMs = 500;
const timedOut = await Promise.race([
exited.then(() => false),
new Promise<boolean>((resolve) =>
setTimeout(() => resolve(true), graceMs),
),
]);
if (timedOut) {
try {
child.kill('SIGKILL');
} catch {
// Already dead.
}
await exited;
}
};
return { pid, title, kill };
}

View File

@@ -0,0 +1,328 @@
// Node-inspector client for Electron's main process.
//
// Why this exists: the shipped Electron has an authenticated-CDP gate
// (see lib/electron.ts) that exits the app whenever
// --remote-debugging-port is on argv. The gate doesn't check --inspect /
// SIGUSR1, so we can attach the Node inspector at runtime — same code
// path as the in-app "Developer → Enable Main Process Debugger" menu.
//
// From the inspector we can evaluate arbitrary JS in the main process,
// which gives us:
// - Electron API access (app, webContents, dialog, BrowserView)
// - Renderer access via webContents.executeJavaScript()
// - Main-process mocks (e.g. dialog.showOpenDialog for T17)
//
// Caveat: `BrowserWindow.getAllWindows()` returns 0 because frame-fix-
// wrapper substitutes the BrowserWindow class and the substitution
// breaks the static registry. Use `webContents.getAllWebContents()`
// instead — that registry stays intact.
interface PendingCall {
resolve: (value: unknown) => void;
reject: (err: Error) => void;
timer: ReturnType<typeof setTimeout>;
}
// CDP accessibility-tree node shape (subset). The full AX tree is a flat
// array of these with parent/child links carried by id refs. We surface
// the value-bearing fields the v7 walker + claudeai.ts page-objects
// actually consume; remaining CDP fields (ignoredReasons,
// frameId, …) are accessible via the string-keyed bag.
export interface AxValue {
type: string;
value?: unknown;
}
export interface AxProperty {
name: string;
value: AxValue;
}
export interface AxNode {
nodeId: string;
parentId?: string;
childIds?: string[];
backendDOMNodeId?: number;
role?: { type: string; value: string };
name?: { type: string; value: string };
// AX state/relation properties (`haspopup`, `expanded`, `modal`,
// `checked`, `disabled`, …). claudeai.ts reads `haspopup` to
// discriminate menu-trigger buttons from action buttons that
// happen to share an accessible name.
properties?: AxProperty[];
ignored?: boolean;
[k: string]: unknown;
}
export class InspectorClient {
// why: 30s default for send() timeouts. "Slow but not stuck."
// Lower defaults break legitimately-slow operations like initial
// page-load on a cold app or a chunky DOM snapshot; higher defaults
// turn renderer-side hangs (blocked event loop, modal trapping focus,
// network-bound script stalled) into invisible silent freezes.
// Consumers can override per-call (timeoutMs arg) or per-instance
// (mutate InspectorClient.defaultTimeoutMs before instantiating).
static defaultTimeoutMs = 30000;
private ws: WebSocket;
private nextId = 0;
private pending = new Map<number, PendingCall>();
// Idempotency flag for close(). Runners + electron.ts close() may
// both call this on the same instance (intentionally — see
// electron.ts launchClaude tracking comment); the flag guarantees
// a second call is a true no-op rather than a redundant ws.close().
private closed = false;
private constructor(ws: WebSocket) {
this.ws = ws;
this.ws.addEventListener('message', (ev) => this.handleMessage(ev));
}
static async connect(port: number): Promise<InspectorClient> {
const meta = await fetch(`http://127.0.0.1:${port}/json/list`).then((r) =>
r.json(),
) as Array<{ webSocketDebuggerUrl: string }>;
if (!meta.length) {
throw new Error(`Inspector at ${port} has no debuggee`);
}
const url = meta[0]!.webSocketDebuggerUrl;
const ws = new WebSocket(url);
await new Promise<void>((resolve, reject) => {
ws.addEventListener('open', () => resolve(), { once: true });
ws.addEventListener(
'error',
(e) => reject(new Error(`inspector ws error: ${e.type}`)),
{ once: true },
);
});
const client = new InspectorClient(ws);
await client.send('Runtime.enable');
await client.send('Runtime.runIfWaitingForDebugger');
return client;
}
private handleMessage(ev: MessageEvent): void {
const msg = JSON.parse(typeof ev.data === 'string' ? ev.data : '{}') as {
id?: number;
error?: unknown;
result?: unknown;
};
if (msg.id !== undefined && this.pending.has(msg.id)) {
const { resolve, reject, timer } = this.pending.get(msg.id)!;
this.pending.delete(msg.id);
clearTimeout(timer);
if (msg.error) {
reject(new Error(JSON.stringify(msg.error)));
} else {
resolve(msg.result);
}
}
}
// why: every pending call gets a timer. When the renderer event loop
// is blocked (modal focus trap, network-bound script stalled, DOM
// snapshot too large) the CDP reply never arrives and the promise
// would hang forever. We reject with a clear "method=X" error and
// drop the pending entry (no leak), but we deliberately do NOT
// close the websocket — a single hung eval shouldn't tear down the
// connection; the next call may succeed.
send(
method: string,
params: Record<string, unknown> = {},
timeoutMs?: number,
): Promise<unknown> {
const id = ++this.nextId;
const ms = timeoutMs ?? InspectorClient.defaultTimeoutMs;
return new Promise((resolve, reject) => {
const timer = setTimeout(() => {
if (this.pending.delete(id)) {
reject(
new Error(
`inspector.send timed out after ${ms}ms (method=${method})`,
),
);
}
}, ms);
this.pending.set(id, { resolve, reject, timer });
this.ws.send(JSON.stringify({ id, method, params }));
});
}
// Evaluate an async expression in the main process; the expression body
// must end with `return X` (or set a value). Returns the JSON-parsed
// value. JSON-stringification inside the IIFE dodges the inspector's
// Promise-result deep-marshaling quirks (returnByValue produces empty
// objects for awaited Promise resolutions on this build).
//
// Bare `require` is NOT a global in the CDP eval scope — go through
// `process.mainModule.require('electron'|'node:fs'|…)` instead.
async evalInMain<T = unknown>(body: string, timeoutMs?: number): Promise<T> {
const expression =
'globalThis.__r = (async () => { ' +
'const __v = await (async () => { ' +
body +
' })(); ' +
'return JSON.stringify(__v === undefined ? null : __v); ' +
'})(); globalThis.__r;';
const result = (await this.send(
'Runtime.evaluate',
{
expression,
awaitPromise: true,
returnByValue: true,
},
timeoutMs,
)) as { result?: { value?: unknown }; exceptionDetails?: unknown };
if (result.exceptionDetails) {
throw new Error(
`evalInMain threw: ${JSON.stringify(result.exceptionDetails)}`,
);
}
const v = result.result?.value;
if (typeof v !== 'string') {
throw new Error(
`evalInMain expected JSON string, got ${JSON.stringify(result.result)}`,
);
}
return JSON.parse(v) as T;
}
// Convenience: evaluate JS in a specific webContents (renderer).
// `urlFilter` selects which webContents (substring match on getURL()).
async evalInRenderer<T = unknown>(
urlFilter: string,
js: string,
timeoutMs?: number,
): Promise<T> {
const escaped = JSON.stringify(js);
const result = await this.evalInMain<T>(
`
const { webContents } = process.mainModule.require('electron');
const all = webContents.getAllWebContents();
const target = all.find(w => w.getURL().includes(${JSON.stringify(urlFilter)}));
if (!target) {
throw new Error('no webContents matching: ${urlFilter.replace(/'/g, "\\'")}');
}
return await target.executeJavaScript(${escaped});
`,
timeoutMs,
);
return result;
}
// Query the renderer's full accessibility tree via Chrome DevTools
// Protocol's `Accessibility.getFullAXTree`. Reachable from main
// process JS (this client connects to Node's debugger, not Chromium's
// — but webContents.debugger gives us full CDP access from there).
//
// `urlFilter` selects which webContents to attach to (substring match
// on getURL()). Idempotent attach: re-using the same webContents
// across calls won't double-attach. Caller is responsible for AX
// cost — at large surfaces full-tree latency may be ≥100ms (see
// fingerprint-v7-plan.md "Open questions"); for those, use a
// scoped subtree query instead.
async getAccessibleTree(
urlFilter: string,
timeoutMs?: number,
): Promise<AxNode[]> {
const result = await this.evalInMain<{ nodes: AxNode[] }>(
`
const { webContents } = process.mainModule.require('electron');
const all = webContents.getAllWebContents();
const target = all.find(w => w.getURL().includes(${JSON.stringify(urlFilter)}));
if (!target) {
throw new Error('no webContents matching: ${urlFilter.replace(/'/g, "\\'")}');
}
if (!target.debugger.isAttached()) {
target.debugger.attach('1.3');
}
try {
await target.debugger.sendCommand('Accessibility.enable');
} catch (err) {
// Already-enabled is benign; surface anything else.
if (!String(err && err.message).includes('already enabled')) {
throw err;
}
}
const r = await target.debugger.sendCommand(
'Accessibility.getFullAXTree',
);
return r;
`,
timeoutMs,
);
return result.nodes;
}
// Resolve the AX-tree-supplied backendNodeId to a renderer-side
// JS object handle, then invoke `.click()` on it. This is the
// click-path counterpart to `getAccessibleTree`: capture identifies
// nodes by backendDOMNodeId, click consumes the same id without any
// selector reconstruction. `DOM.resolveNode` handles cross-frame
// nodes natively, and `Runtime.callFunctionOn` runs in the node's
// own execution context — so the click dispatches against the right
// document even when the target sits in an iframe.
async clickByBackendNodeId(
urlFilter: string,
backendNodeId: number,
timeoutMs?: number,
): Promise<void> {
await this.evalInMain<null>(
`
const { webContents } = process.mainModule.require('electron');
const all = webContents.getAllWebContents();
const target = all.find(w => w.getURL().includes(${JSON.stringify(urlFilter)}));
if (!target) {
throw new Error('no webContents matching: ${urlFilter.replace(/'/g, "\\'")}');
}
if (!target.debugger.isAttached()) {
target.debugger.attach('1.3');
}
const resolved = await target.debugger.sendCommand(
'DOM.resolveNode',
{ backendNodeId: ${backendNodeId} },
);
const objectId = resolved && resolved.object && resolved.object.objectId;
if (!objectId) {
throw new Error(
'clickByBackendNodeId: DOM.resolveNode returned no objectId for ' +
${backendNodeId},
);
}
try {
await target.debugger.sendCommand('Runtime.callFunctionOn', {
objectId,
functionDeclaration: 'function() { this.click(); }',
});
} finally {
try {
await target.debugger.sendCommand('Runtime.releaseObject', {
objectId,
});
} catch (_) {
// Releasing a stale handle is benign.
}
}
return null;
`,
timeoutMs,
);
}
close(): void {
if (this.closed) return;
this.closed = true;
// Drain pending timers + reject in-flight promises so callers
// don't hang on close. Without this an outstanding send() keeps
// the event loop alive past close().
for (const [, pending] of this.pending) {
clearTimeout(pending.timer);
pending.reject(new Error('inspector closed'));
}
this.pending.clear();
try {
this.ws.close();
} catch {
// already closed
}
}
}

View File

@@ -0,0 +1,158 @@
// Per-test config isolation.
//
// Decision 1 in docs/testing/automation.md calls for hermetic
// XDG_CONFIG_HOME / CLAUDE_CONFIG_DIR per test (S19 is the underlying
// primitive). Without it, persisted state leaks between tests:
// SingletonLock from one run blocks the next; S35's saved
// quickWindowPosition contaminates S29's closed-to-tray sanity; etc.
//
// Shape: each call to `createIsolation()` builds a fresh config root
// under $TMPDIR/claude-test-<random>/ and returns the env vars to merge
// into the spawned app, plus a teardown that removes the dir. Pass the
// same handle to multiple `launchClaude({ isolation })` calls when a
// test needs to launch the same app twice with shared state (e.g. S35
// position-memory across restart).
//
// `seedFromHost: true` extends this for tests that need the host's
// signed-in auth state (U01). The host directory itself stays
// untouched after the kill+copy: the test runs hermetically against
// a copy of just the auth-relevant files, and the tmpdir is rm -rf'd
// on cleanup so secrets never persist past the test process.
import { cp, mkdir, mkdtemp, rm, stat } from 'node:fs/promises';
import { homedir, tmpdir } from 'node:os';
import { join } from 'node:path';
import { killHostClaude } from './host-claude.js';
export interface Isolation {
configHome: string;
configDir: string;
cacheHome: string;
dataHome: string;
env: Record<string, string>;
cleanup(): Promise<void>;
}
export interface CreateIsolationOptions {
// When true: kill any running host Claude (LevelDB / SQLite hold
// writer locks while it runs), then copy the auth-relevant subset
// of $XDG_CONFIG_HOME/Claude into the new configDir. The host
// config never gets mutated by the test; secrets never leave the
// per-launch tmpdir.
seedFromHost?: boolean;
}
// Allowlist of relative paths under ~/.config/Claude/ that carry auth
// or first-launch UI state. Everything else is deliberately
// regenerated fresh in the tmpdir:
// - Cache/, Code Cache/, GPUCache/, Dawn*Cache/ — cheap to rebuild
// - blob_storage/, Crashpad/, logs/ — irrelevant to auth
// - SingletonLock, SingletonCookie, SingletonSocket — block startup
// - .org.chromium.Chromium.* — host-specific lock turds
// - claude-code-sessions/, claude-code-vm/, local-agent-mode-sessions/
// — large, account-specific, not needed for renderer auth
//
// Cookies + Local State are the auth-cookie pair (the latter holds
// the os_crypt key wrapper on platforms that need it). IndexedDB +
// Local Storage hold the renderer-side auth context that claude.ai's
// route guards check before redirecting to /login — cookies alone
// leave you bouncing back to login.
const SEED_PATHS = [
'Cookies',
'Cookies-journal',
'Local State',
'Local Storage',
'IndexedDB',
'Session Storage',
'WebStorage',
'SharedStorage',
'Network Persistent State',
'config.json',
'claude_desktop_config.json',
'developer_settings.json',
];
async function exists(path: string): Promise<boolean> {
try {
await stat(path);
return true;
} catch {
return false;
}
}
async function seedAuthFromHost(targetConfigDir: string): Promise<void> {
const hostConfigHome =
process.env.XDG_CONFIG_HOME ?? join(homedir(), '.config');
const hostClaudeDir = join(hostConfigHome, 'Claude');
if (!(await exists(hostClaudeDir))) {
throw new Error(
`seedFromHost: host config dir not found at ${hostClaudeDir}. ` +
'Sign into Claude Desktop on this machine first, then re-run.',
);
}
await mkdir(targetConfigDir, { recursive: true });
let copied = 0;
for (const rel of SEED_PATHS) {
const src = join(hostClaudeDir, rel);
if (!(await exists(src))) continue;
const dst = join(targetConfigDir, rel);
await cp(src, dst, {
recursive: true,
preserveTimestamps: true,
errorOnExist: false,
});
copied++;
}
if (copied === 0) {
throw new Error(
`seedFromHost: ${hostClaudeDir} exists but contains none of the ` +
'expected auth files. Open Claude Desktop, sign in, fully close, ' +
'and re-run.',
);
}
}
export async function createIsolation(
opts: CreateIsolationOptions = {},
): Promise<Isolation> {
const root = await mkdtemp(join(tmpdir(), 'claude-test-'));
const configHome = join(root, 'config');
const configDir = join(configHome, 'Claude');
const cacheHome = join(root, 'cache');
const dataHome = join(root, 'data');
if (opts.seedFromHost) {
// Order matters: kill before copy. While the host app runs,
// LevelDB holds a LOCK file in IndexedDB/Local Storage that
// makes the directory unreadable to a second process, and
// SQLite Cookies has WAL pages that may not be checkpointed.
await killHostClaude();
await seedAuthFromHost(configDir);
}
const env: Record<string, string> = {
XDG_CONFIG_HOME: configHome,
XDG_CACHE_HOME: cacheHome,
XDG_DATA_HOME: dataHome,
// CLAUDE_CONFIG_DIR is honored by launcher-common.sh and by
// the app itself for picking the persisted-settings location.
CLAUDE_CONFIG_DIR: configDir,
};
return {
configHome,
configDir,
cacheHome,
dataHome,
env,
async cleanup() {
await rm(root, { recursive: true, force: true });
},
};
}

View File

@@ -0,0 +1,150 @@
// Name-classifier vocabulary + instance-shape registry. The v7 walker
// (Phase 2) consumes this to decide whether a captured accessible-name
// is stable copy ("Search", "Send"), instance-shaped ("AWAaddrick·Max",
// "Today+12"), or unknown copy that needs human triage. The vocabulary
// `stable` / `suspect` arrays are derived from a prior inventory walk
// by `explore/derive-vocabulary.ts` and re-derived on each major
// upstream release.
//
// First-match-wins ordering: more specific shapes go before general
// ones so e.g. a model-version pattern hits before a generic
// title-case-words pattern.
export interface InstanceShape {
id: string;
regex: RegExp;
// Canonical pattern recorded into the v7 fingerprint's NameMatcher
// when this shape matches. Null on shapes that should *not*
// contribute a regex matcher — those entries fall through to
// `kind: instance` ancestor-presence checks at resolve time.
pattern: string | null;
}
export const INSTANCE_SHAPES: readonly InstanceShape[] = [
// Plan badge — `<handle>·<tier>` with optional trailing PUA glyph
// (Claude Desktop ships private-area font icons as the badge
// ornament; e.g. AWAaddrick·Max).
{
id: 'plan-badge',
regex: /^.+·(Free|Pro|Max|Team|Enterprise)[-\s]*$/u,
pattern: '\\w+·(Free|Pro|Max|Team|Enterprise)',
},
// Model-version names. Stable across users, versioned across
// releases — recording as a pattern lets a re-walked inventory
// keep resolving when upstream bumps "Opus 4.7" → "Opus 4.8".
{ id: 'opus-version', regex: /^Opus \d/, pattern: '^Opus \\d' },
{ id: 'sonnet-version', regex: /^Sonnet \d/, pattern: '^Sonnet \\d' },
{ id: 'haiku-version', regex: /^Haiku \d/, pattern: '^Haiku \\d' },
// Usage / quota percentage suffix ("Usage: plan 11%").
{ id: 'percentage', regex: /\d{1,3}%$/, pattern: '\\d{1,3}%' },
// Relative date a list row often appends to a title ("Untitled
// conversationToday+12"). The shape includes an optional `+N`
// counter for collapsed-instance groupings.
{
id: 'relative-date',
regex:
/(Today|Yesterday|\d+\s(day|hour|minute|second|week|month|year)s?\sago)/,
pattern:
'(Today|Yesterday|\\d+\\s(day|hour|minute|second|week|month|year)s?\\sago)(\\+\\d+)?',
},
// File / quota size suffix ("1.5 GB").
{
id: 'size-with-unit',
regex: /^\d+\.\d+\s\w+/,
pattern: '^\\d+\\.\\d+\\s\\w+',
},
// User handle prefix ("@aaddrick").
{ id: 'user-handle', regex: /@\w+/, pattern: '@\\w+' },
// Cowork session row in the sidebar. Names are status-prefixed
// session titles ("Idle Review PR 555…", "Awaiting input Plan
// automated testing strategy…", "Pull request merged Review issue
// 373"). The status enum is bounded; the title varies per session.
// Recording as a pattern lets the v7 instance-collapse fold the
// whole sidebar list into one representative entry — without this
// shape the title classifies as `suspect` (or `stable` if literal-
// matching once) and each session is captured + drilled
// individually. Placed before `long-title` so the more specific
// shape wins (long-title returns `pattern: null`, which loses
// account-portability for these rows).
{
id: 'cowork-session',
regex:
/^(Idle|Ready|Working|Awaiting input|Pull request merged|Done|Failed|Cancelled)\s/,
pattern:
'^(Idle|Ready|Working|Awaiting input|Pull request merged|Done|Failed|Cancelled)\\s',
},
// Per-row action triggers in list-row contexts. Claude.ai exposes a
// "⋮" menu next to each cowork session / conversation row with an
// aria-label `More options for <row title>` — one button per row.
// Without this shape the per-row title makes each button literally
// unique, so each gets its own stable entry and the BFS drills
// every one. With the shape they collapse to a single representative
// per surface, mirroring the cowork-session row collapse above.
{
id: 'row-more-options',
regex: /^More options for /,
pattern: '^More options for ',
},
// 3+ word title-case prose. No pattern recorded — the title is
// per-conversation, not a recurring shape, so the resolver should
// fall back to ancestor-presence rather than try to match the
// literal text.
{
id: 'long-title',
regex: /^[A-Z][a-z]+ [A-Z][a-z]+ [a-z]/,
pattern: null,
},
] as const;
export type NameClass = 'stable' | 'instance' | 'positional' | 'suspect';
export interface NameClassification {
kind: NameClass;
// Present iff `kind === 'instance'`.
shapeId?: string;
// Present iff `kind === 'instance'`. Null when the matched shape
// has no canonical regex (e.g. long-title) — caller should drop the
// name from the fingerprint and rely on ariaPath + ancestor
// presence.
pattern?: string | null;
}
export interface Vocabulary {
stable: ReadonlySet<string>;
suspect: ReadonlySet<string>;
}
// classifyName decides how a captured accessible-name should be
// matched at resolve time. Priority order tracks the v7 plan's "Name
// classifier" §:
// 1. Empty / whitespace → 'positional' (no usable name)
// 2. Matches an instance-shape regex → 'instance' + shapeId
// 3. Present in vocabulary.stable → 'stable'
// 4. Default → 'suspect' (treated as stable by the walker but
// surfaced for reconciliation review)
//
// The list-row-child rule from the plan ('option/listitem inside
// listbox/list' → 'instance') depends on ariaPath context the
// classifier doesn't have access to here. The walker checks that
// condition before calling classifyName.
export function classifyName(
name: string | null,
vocabulary: Vocabulary,
): NameClassification {
if (name === null || name.trim() === '') {
return { kind: 'positional' };
}
for (const shape of INSTANCE_SHAPES) {
if (shape.regex.test(name)) {
return {
kind: 'instance',
shapeId: shape.id,
pattern: shape.pattern,
};
}
}
if (vocabulary.stable.has(name)) {
return { kind: 'stable' };
}
return { kind: 'suspect' };
}

View File

@@ -0,0 +1,656 @@
// Quick Entry domain wrapper — single point of coupling to upstream's
// main-process structure for QE-* tests.
//
// Why centralize: upstream symbol names (Ko for popup, ut for main, h1
// for the visibility check) drift between releases per CLAUDE.md's
// "Working with Minified JavaScript" notes. If this lookup logic lives
// in 12 separate spec files, every release becomes a 12-file fix. If
// it lives here, it's one fix.
//
// Discovery strategy: don't rely on minified symbol names. Use shape:
// - Popup webContents = the new entry that appears after the shortcut
// fires (snapshot/diff pattern).
// - Popup BrowserWindow = the only one constructed with
// transparent: true && alwaysOnTop: true.
// - Main BrowserWindow = the one whose webContents URL contains
// "claude.ai".
//
// Shortcut injection: ydotool through /dev/uinput. Works on X11,
// XWayland, and native Wayland with portal-grabbed shortcuts (KDE-W,
// Ubu-W, KDE-X). Does NOT work where the OS-level grab itself is broken
// (#404 GNOME-W) — that's the test, not a tool gap. Tests that need
// the popup to be open *without* exercising the OS shortcut grab call
// `installInterceptor()` first to stash a popup-constructor ref via
// BrowserWindow construction-time capture, then... we still need a
// trigger. For the closeout sweep the assumption is ydotool is present
// and the OS grab works on the row under test. S11/S12 explicitly test
// the grab path; everything else assumes it.
import { execFile } from 'node:child_process';
import { readFile } from 'node:fs/promises';
import { homedir } from 'node:os';
import { join } from 'node:path';
import { promisify } from 'node:util';
import type { InspectorClient } from './inspector.js';
import { retryUntil, sleep } from './retry.js';
const exec = promisify(execFile);
export interface WebContentsInfo {
id: number;
url: string;
}
export interface BrowserWindowState {
visible: boolean;
minimized: boolean;
fullScreen: boolean;
focused: boolean;
bounds: { x: number; y: number; width: number; height: number };
}
// Linux key codes for the upstream default Ctrl+Alt+Space accelerator.
// Override via constructor option for tests that exercise a remapped
// shortcut.
const DEFAULT_KEY_SEQUENCE = [
'29:1', // LEFTCTRL down
'56:1', // LEFTALT down
'57:1', // SPACE down
'57:0', // SPACE up
'56:0', // LEFTALT up
'29:0', // LEFTCTRL up
];
export class QuickEntry {
constructor(
private readonly inspector: InspectorClient,
private readonly keySeq: string[] = DEFAULT_KEY_SEQUENCE,
) {}
// Capture BrowserWindow refs by hooking prototype methods, not the
// constructor.
//
// Why prototype-level: scripts/frame-fix-wrapper.js returns the
// electron module wrapped in a Proxy whose `get` trap returns a
// closure-captured PatchedBrowserWindow. A constructor-level wrap
// (`electron.BrowserWindow = Wrapped`) writes to the underlying
// module but the Proxy keeps returning PatchedBrowserWindow on
// reads, so the wrap is bypassed entirely. Hooking
// `BrowserWindow.prototype.loadFile` instead captures every
// instance regardless of which subclass it was constructed
// through — Patched, frame-fix-wrapped, or plain.
//
// The popup is identified by its loadFile target:
// `.vite/renderer/quick_window/quick-window.html`
// (build-reference index.js:515443).
async installInterceptor(): Promise<void> {
await this.inspector.evalInMain<null>(`
if (globalThis.__qeInterceptorInstalled) return null;
const electron = process.mainModule.require('electron');
const proto = electron.BrowserWindow.prototype;
globalThis.__qeWindows = [];
const origLoadFile = proto.loadFile;
proto.loadFile = function(filePath, ...rest) {
try {
const url = String(filePath || '');
globalThis.__qeWindows.push({
ref: this,
loadedFile: url,
});
} catch (e) { /* recording must never throw */ }
return origLoadFile.call(this, filePath, ...rest);
};
const origLoadURL = proto.loadURL;
proto.loadURL = function(url, ...rest) {
try {
globalThis.__qeWindows.push({
ref: this,
loadedFile: String(url || ''),
});
} catch (e) {}
return origLoadURL.call(this, url, ...rest);
};
globalThis.__qeInterceptorInstalled = true;
return null;
`);
}
// The popup is the BrowserWindow whose loadFile target ends with
// `quick-window.html`. Stable path — upstream uses it verbatim
// (build-reference index.js:515443).
private popupSelector(): string {
return `(w => {
if (!w || !w.ref || w.ref.isDestroyed()) return false;
const f = String(w.loadedFile || '');
return f.indexOf('quick-window.html') !== -1
|| f.indexOf('quick_window/') !== -1;
})`;
}
async listWebContents(): Promise<WebContentsInfo[]> {
return await this.inspector.evalInMain<WebContentsInfo[]>(`
const { webContents } = process.mainModule.require('electron');
return webContents.getAllWebContents().map(w => ({
id: w.id, url: w.getURL(),
}));
`);
}
// Find the popup by elimination: not the main shell (file:// chrome)
// and not the embedded claude.ai BrowserView.
async getPopupWebContents(): Promise<WebContentsInfo | null> {
const all = await this.listWebContents();
const popup = all.find((w) => isPopupUrl(w.url));
return popup ?? null;
}
// Send the configured accelerator via ydotool. Errors out (caller
// can catch + skip) if ydotool isn't on PATH.
//
// YDOTOOL_SOCKET is honored from the parent env; defaults to
// /tmp/.ydotool_socket (the path the shipped systemd unit uses
// after the override drop-in). Without YDOTOOL_SOCKET, the client
// probes /run/user/$UID/.ydotool_socket — a location the daemon
// doesn't bind to, so the call fails confusingly.
async openViaShortcut(): Promise<void> {
await ensureYdotool();
await exec('ydotool', ['key', ...this.keySeq], {
env: {
...process.env,
YDOTOOL_SOCKET:
process.env.YDOTOOL_SOCKET ?? '/tmp/.ydotool_socket',
} as Record<string, string>,
});
}
// openViaShortcut + waitForPopupReady, with retry for the
// upstream-only-shows-when-logged-in race (build-reference
// index.js:515604: `function lHn() { return !user.isLoggedOut; }`).
// On a fresh launch, the renderer URL flips past /login before
// the main-process user object is populated; the first shortcut
// constructs the popup but skips show(). A second shortcut after
// a brief settle hits the populated-user path. Total budget is
// `attempts * (perAttemptMs + retryDelayMs)`.
async openAndWaitReady(opts: {
attempts?: number;
perAttemptMs?: number;
retryDelayMs?: number;
} = {}): Promise<void> {
const attempts = opts.attempts ?? 3;
const perAttemptMs = opts.perAttemptMs ?? 8_000;
const retryDelayMs = opts.retryDelayMs ?? 1_500;
let lastErr: unknown = null;
for (let i = 0; i < attempts; i++) {
await this.openViaShortcut();
try {
await this.waitForPopupReady(perAttemptMs);
return;
} catch (err) {
lastErr = err;
if (i < attempts - 1) await sleep(retryDelayMs);
}
}
throw new Error(
`openAndWaitReady: popup never became ready after ${attempts} ` +
`shortcut presses. Last error: ` +
(lastErr instanceof Error ? lastErr.message : String(lastErr)),
);
}
// Wait for the popup webContents to appear after openViaShortcut().
async waitForPopup(timeoutMs = 5000): Promise<WebContentsInfo> {
const wc = await retryUntil(
async () => this.getPopupWebContents(),
{ timeout: timeoutMs, interval: 100 },
);
if (!wc) {
throw new Error(
`Quick Entry popup webContents did not appear within ${timeoutMs}ms`,
);
}
return wc;
}
// Wait for the popup to become hidden (the upstream "submit
// accepted" signal). Upstream reuses the popup BrowserWindow
// across invocations — Ko stays alive, only the visibility
// toggles — so checking webContents existence would never
// resolve. Read isVisible() on the captured BrowserWindow ref
// instead.
async waitForPopupClosed(timeoutMs = 5000): Promise<void> {
const closed = await retryUntil(
async () => {
const state = await this.getPopupState();
if (!state) return true; // destroyed → closed
return state.visible ? null : true;
},
{ timeout: timeoutMs, interval: 100 },
);
if (!closed) {
throw new Error(
`Quick Entry popup did not become hidden within ${timeoutMs}ms`,
);
}
}
// Read live properties of the popup BrowserWindow. Replaces the
// previous getPopupConstructionArgs — construction-time options
// aren't observable through the prototype-method hook, but every
// upstream-relevant signal has a runtime equivalent. Frame state
// uses `getContentBounds() vs getBounds()` (frameless windows
// have equal content + frame bounds). Transparent uses the
// background color (popup is `#00000000`).
async getPopupRuntimeProps(): Promise<{
frameless: boolean;
transparent: boolean;
alwaysOnTop: boolean;
backgroundColor: string;
} | null> {
// `skipTaskbar` was previously reported here but BrowserWindow
// has no isSkipTaskbar() getter; the field hardcoded `false`
// regardless of how the popup was constructed, which is
// misleading. Dropped — no current spec consumes it. If a
// future test needs it, capture via a setSkipTaskbar wrap in
// installInterceptor() rather than faking a getter.
return await this.inspector.evalInMain(`
const wins = globalThis.__qeWindows || [];
const isPopup = ${this.popupSelector()};
const popup = wins.find(isPopup);
if (!popup || !popup.ref || popup.ref.isDestroyed()) return null;
const w = popup.ref;
const bounds = w.getBounds();
const content = w.getContentBounds();
const bg = (w.getBackgroundColor && w.getBackgroundColor()) || '';
return {
frameless: bounds.width === content.width && bounds.height === content.height,
transparent: bg === '#00000000' || bg === '#0000',
alwaysOnTop: w.isAlwaysOnTop(),
backgroundColor: bg,
};
`);
}
// Read the popup BrowserWindow's runtime visibility / bounds /
// focus / fullscreen state. Used by waitForPopupReady and
// waitForPopupClosed; the popup is reused across invocations
// (Ko stays alive, only visibility toggles), so isVisible() is
// the right "open vs closed" signal — not webContents existence.
async getPopupState(): Promise<(BrowserWindowState & { alwaysOnTop: boolean }) | null> {
return await this.inspector.evalInMain(`
const wins = globalThis.__qeWindows || [];
const isPopup = ${this.popupSelector()};
const popup = wins.find(isPopup);
if (!popup || !popup.ref || popup.ref.isDestroyed()) return null;
const w = popup.ref;
return {
visible: w.isVisible(),
minimized: w.isMinimized(),
fullScreen: w.isFullScreen(),
focused: w.isFocused(),
bounds: w.getBounds(),
alwaysOnTop: w.isAlwaysOnTop(),
};
`);
}
// Wait for the popup to be fully ready for input — meaning:
// (a) BrowserWindow has been show()n (isVisible === true),
// which only fires after upstream's `ready-to-show` event,
// which is after React's mount + first-pass effects, which
// is when document.addEventListener('keydown', ...) gets
// attached;
// (b) the textarea exists in the DOM.
// Without (a), first-time-mount typing fires keydown into a
// document with no listener and the submit silently drops.
async waitForPopupReady(timeoutMs = 5000): Promise<void> {
const popup = await this.waitForPopup(timeoutMs);
let lastState: unknown = null;
const ready = await retryUntil(
async () => {
const state = await this.getPopupState();
const dom = await this.inspector
.evalInMain<{
readyState: string;
hasTextarea: boolean;
} | null>(
`
const { webContents } = process.mainModule.require('electron');
const wc = webContents.fromId(${popup.id});
if (!wc || wc.isDestroyed()) return null;
return await wc.executeJavaScript(\`(() => ({
readyState: document.readyState,
hasTextarea: !!(document.querySelector('textarea')
|| document.querySelector('[contenteditable="true"]')),
}))()\`);
`,
)
.catch(() => null);
lastState = { state, dom };
if (!state || !state.visible) return null;
return dom && dom.hasTextarea ? dom : null;
},
{ timeout: timeoutMs, interval: 100 },
);
if (!ready) {
throw new Error(
`Popup did not become visible with a textarea within ${timeoutMs}ms. ` +
`Last observed: ${JSON.stringify(lastState)}`,
);
}
}
// Type a prompt into the popup's textarea and submit. The popup is
// a React app with a textarea + send button; React tracks input
// values via a private setter, so plain `el.value = ...` is ignored.
// The native-setter dance below is the standard React-friendly path.
//
// Waits for the textarea to exist before dispatching — first-time
// lazy popup creation needs the React mount to complete, otherwise
// the input event lands before any state listener and upstream
// drops the submit as empty.
async typeAndSubmit(text: string): Promise<void> {
await this.waitForPopupReady();
const popup = await this.getPopupWebContents();
if (!popup) throw new Error('popup vanished after waitForPopupReady');
const popupId = popup.id;
await this.inspector.evalInMain<null>(`
const { webContents } = process.mainModule.require('electron');
const wc = webContents.fromId(${popupId});
if (!wc) throw new Error('popup webContents ${popupId} gone');
await wc.executeJavaScript(${JSON.stringify(typeAndSubmitJs(text))});
return null;
`);
}
// Read the persisted popup position (S35) directly from the
// on-disk store. electron-store defaults to `config.json` under the
// app's userData dir; for claude-desktop that's
// `${configDir}/Claude/config.json` (or `~/.config/Claude/...`
// when no isolation is in play). Reading the file beats the
// previous globalThis-walk: that probe matched any object with
// .get/.set returning a `quickWindowPosition` value, which is
// fragile against unrelated minified objects coincidentally
// matching the shape.
//
// Optional `configDir` keeps the call backward-compatible — pass
// `app.isolation?.configDir` from runners under per-test isolation,
// omit it to fall back to the host's `~/.config/Claude`.
async getStoredPosition(configDir?: string): Promise<unknown | null> {
const storePath = configDir
? join(configDir, 'config.json')
: join(homedir(), '.config/Claude/config.json');
try {
const raw = await readFile(storePath, 'utf8');
const parsed = JSON.parse(raw) as { quickWindowPosition?: unknown };
return parsed.quickWindowPosition ?? null;
} catch {
// File missing (never saved) or unreadable — both null.
return null;
}
}
}
// Upstream loads the popup via
// loadFile('.vite/renderer/quick_window/quick-window.html')
// (build-reference index.js:515443). Anchor on that exact path. Fall
// back to a broader 'quick_window/' substring if upstream renames just
// the HTML file.
export function isPopupUrl(url: string): boolean {
if (!url.startsWith('file://')) return false;
if (url.includes('claude.ai')) return false;
if (url.includes('quick_window/quick-window.html')) return true;
if (url.includes('/quick_window/')) return true;
return false;
}
// React-friendly value setter. document.activeElement isn't reliable
// because the popup may not have focus on construction; we walk the
// DOM for the only textarea (or contenteditable).
function typeAndSubmitJs(text: string): string {
const escaped = JSON.stringify(text);
return `
(async () => {
const input = document.querySelector('textarea')
|| document.querySelector('[contenteditable="true"]');
if (!input) throw new Error('no textarea/contenteditable in popup DOM');
input.focus();
if (input.tagName === 'TEXTAREA') {
const setter = Object.getOwnPropertyDescriptor(
HTMLTextAreaElement.prototype, 'value'
).set;
setter.call(input, ${escaped});
input.dispatchEvent(new Event('input', { bubbles: true }));
} else {
input.textContent = ${escaped};
input.dispatchEvent(new InputEvent('input', { bubbles: true, data: ${escaped} }));
}
// Submit via Enter keydown — popup binds its own keyhandler
// (renderer-side per the closeout doc).
input.dispatchEvent(new KeyboardEvent('keydown', {
key: 'Enter', code: 'Enter', keyCode: 13, which: 13,
bubbles: true, cancelable: true,
}));
input.dispatchEvent(new KeyboardEvent('keyup', {
key: 'Enter', code: 'Enter', keyCode: 13, which: 13,
bubbles: true,
}));
})()
`;
}
// Main-window state manipulation. Used by QE-7/8/9/10/11 to set the
// precondition (minimized, hidden-to-tray, fullscreen, etc.) before
// triggering Quick Entry.
//
// All methods walk webContents to find the claude.ai-hosting
// BrowserWindow via BrowserWindow.fromWebContents(). The
// `BrowserWindow.getAllWindows()` registry is broken by frame-fix-
// wrapper (see lib/inspector.ts gotchas) but `fromWebContents` uses a
// different code path and remains reliable.
export class MainWindow {
constructor(private readonly inspector: InspectorClient) {}
async setState(action: 'minimize' | 'hide' | 'show' | 'restore' | 'fullScreen' | 'unFullScreen' | 'focus' | 'close'): Promise<void> {
await this.inspector.evalInMain<null>(`
const { webContents, BrowserWindow } = process.mainModule.require('electron');
const main = webContents.getAllWebContents().find(w => w.getURL().includes('claude.ai'));
if (!main) throw new Error('no claude.ai webContents — main not yet loaded');
const win = BrowserWindow.fromWebContents(main);
if (!win) throw new Error('no BrowserWindow for claude.ai webContents');
switch (${JSON.stringify(action)}) {
case 'minimize': win.minimize(); break;
case 'hide': win.hide(); break;
case 'show': win.show(); break;
case 'restore': win.restore(); break;
case 'fullScreen': win.setFullScreen(true); break;
case 'unFullScreen':win.setFullScreen(false); break;
case 'focus': win.focus(); break;
// 'close' fires the BrowserWindow 'close' event so
// frame-fix-wrapper.js:178-185 (the close-to-tray
// interceptor) and the upstream before-quit flow
// run as they would on a real X-button click. NOT
// the same as 'hide' — that bypasses the wrapper.
// T08 asserts on this distinction.
case 'close': win.close(); break;
}
return null;
`);
// Compositor-side state changes are async — small settle.
await sleep(150);
}
async getState(): Promise<BrowserWindowState | null> {
return await this.inspector.evalInMain(`
const { webContents, BrowserWindow } = process.mainModule.require('electron');
const main = webContents.getAllWebContents().find(w => w.getURL().includes('claude.ai'));
if (!main) return null;
const win = BrowserWindow.fromWebContents(main);
if (!win || win.isDestroyed()) return null;
return {
visible: win.isVisible(),
minimized: win.isMinimized(),
fullScreen: win.isFullScreen(),
focused: win.isFocused(),
bounds: win.getBounds(),
};
`);
}
}
// Wait for the claude.ai user object to be loaded — the precondition
// for upstream's lHn() (`!user.isLoggedOut`) returning true. The
// shortcut handler calls Ko.show() only when lHn() is true; if the
// renderer hasn't finished loading the user yet, the popup gets
// constructed and ready-to-show fires, but show() is silently
// skipped (build-reference index.js:515604). The user object is
// available once the renderer has navigated past the login page —
// e.g. /new, /chat/<uuid>, /code, /projects.
//
// Returns the post-login URL on success. Returns null on timeout —
// caller can decide to skip vs fail.
//
// Anchored at the host root and bounded with a path-terminator class so
// only `/login`, `/auth`, `/sign-in` etc. as the *first* path segment
// match. The previous unanchored `/\/(login|auth|sign[-_]?in)/i` also
// caught substrings like `/oauth/callback` (auth) and any URL containing
// `/login` further down the path.
const LOGIN_URL_RE =
/^https?:\/\/[^/]+\/(login|auth|sign[-_]?in)(?:[/?#]|$)/i;
export async function waitForUserLoaded(
inspector: InspectorClient,
timeoutMs = 30_000,
): Promise<string | null> {
return await retryUntil(
async () => {
const urls = await inspector.evalInMain<string[]>(`
const { webContents } = process.mainModule.require('electron');
return webContents.getAllWebContents()
.filter(w => w.getURL().includes('claude.ai'))
.map(w => w.getURL());
`);
const postLogin = urls.find(
(u) => !LOGIN_URL_RE.test(u) && u.includes('claude.ai'),
);
return postLogin ?? null;
},
{ timeout: timeoutMs, interval: 250 },
);
}
// Wait for a new chat session to load in the claude.ai webContents.
// Returns the URL once a /chat/<uuid> path is reached. This is the
// network-coupled half of the layered submit assertion (S31): a slow
// claude.ai or a network blip can fail this independently of any QE
// regression. Callers should treat its failure as Should-not-Critical.
const CHAT_URL_RE = /\/chat\/[0-9a-f-]{8,}/i;
export async function waitForNewChat(
inspector: InspectorClient,
timeoutMs = 15_000,
): Promise<string | null> {
return await retryUntil(
async () => {
const all = await inspector.evalInMain<{ url: string }[]>(`
const { webContents } = process.mainModule.require('electron');
return webContents.getAllWebContents()
.filter(w => w.getURL().includes('claude.ai'))
.map(w => ({ url: w.getURL() }));
`);
const match = all.find((w) => CHAT_URL_RE.test(w.url));
return match ? match.url : null;
},
{ timeout: timeoutMs, interval: 250 },
);
}
// Local-only assertion half: did the popup-side IPC fire with the
// right payload? Wraps the popup's `requestDismissWithPayload` IPC
// channel by intercepting it on the main side. Call before
// typeAndSubmit; resolves with the captured payload (or null on
// timeout).
export async function captureSubmitIpc(
inspector: InspectorClient,
timeoutMs = 5000,
): Promise<{ text: string } | null> {
await inspector.evalInMain<null>(`
if (!globalThis.__qeIpcInstalled) {
const { ipcMain } = process.mainModule.require('electron');
globalThis.__qeIpcCalls = [];
// Wrap every existing 'requestDismiss'-shaped channel.
// Channel names are minified-stable: requestDismiss /
// requestDismissWithPayload (closeout doc index.js:515409).
const channels = ['requestDismissWithPayload', 'requestDismiss'];
for (const ch of channels) {
const handlers = ipcMain._invokeHandlers || ipcMain._events || {};
// Best-effort: register a parallel listener that records
// invocations without disturbing the original handler.
ipcMain.on(ch, (_event, payload) => {
globalThis.__qeIpcCalls.push({ channel: ch, payload, ts: Date.now() });
});
}
globalThis.__qeIpcInstalled = true;
}
return null;
`);
return await retryUntil(
async () => {
const calls = await inspector.evalInMain<
{ channel: string; payload: unknown; ts: number }[]
>(`return globalThis.__qeIpcCalls || []`);
const submit = calls.find(
(c) =>
c.channel === 'requestDismissWithPayload' &&
c.payload != null &&
typeof c.payload === 'object',
);
if (!submit) return null;
const p = submit.payload as Record<string, unknown>;
const text =
typeof p.text === 'string'
? p.text
: typeof p.prompt === 'string'
? p.prompt
: typeof p.value === 'string'
? p.value
: '';
return { text };
},
{ timeout: timeoutMs, interval: 100 },
);
}
async function ensureYdotool(): Promise<void> {
try {
// `ydotool` with no args exits 1 and prints the help text — that
// confirms the binary works without sending input. Avoid
// `ydotool --help` which is rejected as an unknown command.
await exec('ydotool', [], {
env: {
...process.env,
YDOTOOL_SOCKET:
process.env.YDOTOOL_SOCKET ?? '/tmp/.ydotool_socket',
} as Record<string, string>,
});
} catch (err) {
const e = err as { code?: string | number; stderr?: string };
// exit 1 with usage help is normal — only fail on ENOENT (no
// binary) or stderr socket errors.
const stderr = (e.stderr ?? '').toString();
if (e.code === 'ENOENT') {
throw new Error(
'ydotool binary not found on PATH. Install with ' +
'`dnf install ydotool` / `apt install ydotool`.',
);
}
if (stderr.includes('failed to connect socket')) {
throw new Error(
'ydotoold socket not reachable. Start the daemon ' +
'(`sudo systemctl start ydotool.service`) and ensure ' +
'YDOTOOL_SOCKET points at its bind path. Underlying: ' +
stderr.trim(),
);
}
// Any other non-zero exit (notably exit 1 with usage) is fine.
}
}

View File

@@ -0,0 +1,27 @@
export interface RetryOptions {
timeout?: number;
interval?: number;
message?: string;
}
export async function retryUntil<T>(
fn: () => Promise<T | null | undefined>,
options: RetryOptions = {},
): Promise<T | null> {
const timeout = options.timeout ?? 10_000;
const interval = options.interval ?? 250;
const start = Date.now();
while (Date.now() - start < timeout) {
const result = await fn();
if (result !== null && result !== undefined) {
return result;
}
await sleep(interval);
}
return null;
}
export function sleep(ms: number): Promise<void> {
return new Promise((resolve) => setTimeout(resolve, ms));
}

View File

@@ -0,0 +1,48 @@
// Row-aware skip primitive.
//
// Spec files declare which matrix rows they apply to. Anything else is
// skipped (not failed) so the JUnit run carries `<skipped>` →
// `matrix.md` cell `-`. See Decision 1 in docs/testing/automation.md
// for the JUnit-to-cell mapping.
//
// Usage in a runner:
// skipUnlessRow(testInfo, ['KDE-W', 'GNOME-W', 'Ubu-W']);
//
// The reason is auto-formatted from the row list so the dashboard
// caller doesn't have to write it.
import type { TestInfo } from '@playwright/test';
import { getEnv } from './env.js';
export type Row =
| 'KDE-W'
| 'KDE-X'
| 'GNOME-W'
| 'GNOME-X'
| 'Ubu-W'
| 'Ubu-X'
| 'COSMIC'
| 'Sway'
| 'Niri'
| 'Hypr-O'
| 'Hypr-N'
| 'i3';
export function currentRow(): string {
return getEnv().row;
}
export function skipUnlessRow(testInfo: TestInfo, allowed: Row[]): void {
const row = currentRow();
if (allowed.includes(row as Row)) return;
testInfo.skip(
true,
`row ${row} not in [${allowed.join(', ')}] — applies-to mismatch`,
);
}
export function skipOnRow(testInfo: TestInfo, blocked: Row[]): void {
const row = currentRow();
if (!blocked.includes(row as Row)) return;
testInfo.skip(true, `row ${row} excluded`);
}

View File

@@ -0,0 +1,53 @@
import { getSessionBus, getConnectionPid, method } from './dbus.js';
import type { Variant } from 'dbus-next';
const WATCHER_DEST = 'org.kde.StatusNotifierWatcher';
const WATCHER_PATH = '/StatusNotifierWatcher';
const ITEM_IFACE = 'org.kde.StatusNotifierItem';
export interface SniItem {
service: string;
objectPath: string;
}
export async function listRegisteredItems(): Promise<SniItem[]> {
const bus = getSessionBus();
const proxy = await bus.getProxyObject(WATCHER_DEST, WATCHER_PATH);
const props = proxy.getInterface('org.freedesktop.DBus.Properties');
const result = await method(props, 'Get')(
WATCHER_DEST,
'RegisteredStatusNotifierItems',
);
const variant = result as Variant<string[]>;
return variant.value.map(parseItemAddress);
}
export async function findItemByPid(pid: number): Promise<SniItem | null> {
const items = await listRegisteredItems();
for (const item of items) {
try {
const itemPid = await getConnectionPid(item.service);
if (itemPid === pid) {
return item;
}
} catch {
// connection may have gone away mid-iteration; skip
}
}
return null;
}
export async function activateItem(item: SniItem): Promise<void> {
const bus = getSessionBus();
const proxy = await bus.getProxyObject(item.service, item.objectPath);
const iface = proxy.getInterface(ITEM_IFACE);
await method(iface, 'Activate')(0, 0);
}
function parseItemAddress(raw: string): SniItem {
const slash = raw.indexOf('/');
if (slash === -1) {
return { service: raw, objectPath: '/StatusNotifierItem' };
}
return { service: raw.slice(0, slash), objectPath: raw.slice(slash) };
}

View File

@@ -0,0 +1,71 @@
import { execFile } from 'node:child_process';
import { promisify } from 'node:util';
const exec = promisify(execFile);
export interface FrameExtents {
left: number;
right: number;
top: number;
bottom: number;
}
export async function findX11WindowByPid(pid: number): Promise<string | null> {
// Walk _NET_CLIENT_LIST and match on _NET_WM_PID. Pure xprop, no
// xdotool dependency — Electron's main window will surface here once
// the WM has accepted it.
const ids = await listClientWindows();
let firstMatch: string | null = null;
for (const id of ids) {
const wmPid = await getWindowPid(id);
if (wmPid !== pid) continue;
const title = await getWindowProperty(id, '_NET_WM_NAME');
if (title) return id;
if (!firstMatch) firstMatch = id;
}
return firstMatch;
}
async function listClientWindows(): Promise<string[]> {
try {
const { stdout } = await exec('xprop', ['-root', '_NET_CLIENT_LIST']);
// _NET_CLIENT_LIST(WINDOW): window id # 0x1234, 0x5678, ...
const m = stdout.match(/#\s*(.+)$/m);
if (!m) return [];
return m[1]!.split(',').map((s) => s.trim()).filter(Boolean);
} catch {
return [];
}
}
async function getWindowPid(windowId: string): Promise<number | null> {
const raw = await getWindowProperty(windowId, '_NET_WM_PID');
if (!raw) return null;
const n = parseInt(raw, 10);
return Number.isNaN(n) ? null : n;
}
export async function getFrameExtents(windowId: string): Promise<FrameExtents | null> {
const raw = await getWindowProperty(windowId, '_NET_FRAME_EXTENTS');
if (!raw) return null;
const nums = raw.split(',').map((s) => parseInt(s.trim(), 10));
if (nums.length !== 4 || nums.some(Number.isNaN)) return null;
return { left: nums[0]!, right: nums[1]!, top: nums[2]!, bottom: nums[3]! };
}
export async function getWindowTitle(windowId: string): Promise<string | null> {
const raw = await getWindowProperty(windowId, '_NET_WM_NAME');
if (!raw) return null;
const m = raw.match(/^"(.*)"$/s);
return m ? m[1]! : raw;
}
async function getWindowProperty(windowId: string, prop: string): Promise<string | null> {
try {
const { stdout } = await exec('xprop', ['-id', windowId, prop]);
const m = stdout.match(/=\s*(.+)$/m);
return m ? m[1]!.trim() : null;
} catch {
return null;
}
}

View File

@@ -0,0 +1,184 @@
import { test, expect } from '@playwright/test';
import { spawn } from 'node:child_process';
import { existsSync } from 'node:fs';
import { dirname } from 'node:path';
import { createIsolation } from '../lib/isolation.js';
// H-prefix runners are HARNESS self-tests — they validate the test
// harness's preconditions and the build pipeline's invariants, distinct
// from T-tests (upstream test cases) and S-tests (doc-spec entries).
// They tend to be cheap (file probes, exit-code assertions) and exist
// to catch silent drift in the things our other tests assume.
//
// H01 — CDP auth gate canary.
//
// The whole L1 strategy (lib/electron.ts:96-110) hinges on the fact
// that the shipped Electron exits the app whenever
// `--remote-debugging-port` / `--remote-debugging-pipe` is on argv
// without a valid CLAUDE_CDP_AUTH token. If upstream removes that
// gate, every L1 test silently weakens — Playwright's
// `_electron.launch()` (which always injects --remote-debugging-port=0)
// would start working again, but our SIGUSR1-attach pathway would
// keep "passing" without exercising the contract it was built for.
//
// This canary spawns the bundled Electron directly with
// --remote-debugging-port=0 and NO auth token, then asserts the
// process exits with code 1 (the gate's `process.exit(1)` per
// lib/electron.ts:96-97) and was not killed by signal. Timeout
// without exit means the gate is gone.
//
// Spawn-only — no app stays running, no inspector attach, no
// X11 window probe. Pure exit-code observation under isolation
// so the host config never sees the failed launch.
//
// Row-independent: the gate's Linux behavior is the same on every
// row we ship to. Don't `skipUnlessRow`.
// DEFAULT_INSTALL_PATHS mirror lib/electron.ts:123-132 — kept inline
// rather than importing resolveInstall() so this canary can run even
// if a future change to electron.ts breaks the import surface (the
// canary should be the LEAST coupled spec to any moving part).
const DEFAULT_INSTALL_PATHS: { electron: string; asar: string }[] = [
{
electron: '/usr/lib/claude-desktop/node_modules/electron/dist/electron',
asar: '/usr/lib/claude-desktop/node_modules/electron/dist/resources/app.asar',
},
{
electron: '/opt/Claude/node_modules/electron/dist/electron',
asar: '/opt/Claude/node_modules/electron/dist/resources/app.asar',
},
];
function resolveInstallInline(): { electron: string; asar: string } {
const envBin = process.env.CLAUDE_DESKTOP_ELECTRON;
const envAsar = process.env.CLAUDE_DESKTOP_APP_ASAR;
if (envBin && envAsar) return { electron: envBin, asar: envAsar };
for (const candidate of DEFAULT_INSTALL_PATHS) {
if (existsSync(candidate.electron) && existsSync(candidate.asar)) {
return candidate;
}
}
throw new Error(
'Could not locate claude-desktop install. Set CLAUDE_DESKTOP_ELECTRON ' +
'and CLAUDE_DESKTOP_APP_ASAR, or install the deb/rpm package.',
);
}
test.setTimeout(30_000);
test('H01 — CDP auth gate fires on --remote-debugging-port without token', async ({}, testInfo) => {
testInfo.annotations.push({ type: 'severity', description: 'Critical' });
testInfo.annotations.push({ type: 'surface', description: 'CDP auth gate' });
const { electron: electronBin, asar } = resolveInstallInline();
const appDir = dirname(dirname(dirname(dirname(electronBin))));
// Fresh isolation — the gate trips before any persisted state is
// touched, but if anything sneaks past `process.exit(1)` we'd
// rather it write to /tmp than ~/.config/Claude.
const isolation = await createIsolation();
const start = Date.now();
// Raw spawn — no LAUNCHER_INJECTED_FLAGS, no isolation env beyond
// what we set explicitly. The OPPOSITE of launchClaude(): we WANT
// the debug-port flag on argv so the gate fires.
const argv = [
'--remote-debugging-port=0',
asar,
];
// Build env: scrub CLAUDE_CDP_AUTH so a developer who set it
// locally doesn't accidentally pass the gate. Keep the rest of
// the parent env so Electron's normal load path (DISPLAY,
// XDG_RUNTIME_DIR, etc.) still works up to the gate check.
const env: Record<string, string> = {};
for (const [k, v] of Object.entries(process.env)) {
if (v !== undefined) env[k] = v;
}
delete env.CLAUDE_CDP_AUTH;
for (const [k, v] of Object.entries(isolation.env)) {
env[k] = v;
}
const proc = spawn(electronBin, argv, {
cwd: appDir,
env,
stdio: 'ignore',
detached: false,
});
let exitCode: number | null = null;
let signalCode: NodeJS.Signals | null = null;
let timedOut = false;
try {
await Promise.race([
new Promise<void>((resolve) => {
proc.once('exit', (code, signal) => {
exitCode = code;
signalCode = signal;
resolve();
});
}),
new Promise<void>((resolve) => {
setTimeout(() => {
timedOut = true;
resolve();
}, 10_000);
}),
]);
} finally {
// If the gate didn't fire we have a live Electron — kill it
// hard so the test environment isn't polluted by a running
// app pointed at the host's display.
if (proc.exitCode === null && proc.signalCode === null) {
proc.kill('SIGKILL');
await new Promise<void>((resolve) => {
proc.once('exit', () => resolve());
setTimeout(() => resolve(), 2_000);
});
}
await isolation.cleanup();
}
const elapsedMs = Date.now() - start;
await testInfo.attach('spawn-argv', {
body: JSON.stringify([electronBin, ...argv], null, 2),
contentType: 'application/json',
});
await testInfo.attach('exit-info', {
body: JSON.stringify(
{
exitCode,
signalCode,
timedOut,
elapsedMs,
note:
'Gate fires via process.exit(1) (lib/electron.ts:96-107). ' +
'exitCode=1, signalCode=null is the canonical signature.',
},
null,
2,
),
contentType: 'application/json',
});
if (timedOut) {
throw new Error(
'CDP gate did not fire — app stayed running with ' +
'--remote-debugging-port flag and no auth token, gate may ' +
'have been removed (lib/electron.ts:96-107). The L1 test ' +
'strategy depends on this gate being present.',
);
}
expect(
exitCode,
'gate exits with code 1 (process.exit(1) in index.pre.js)',
).toBe(1);
expect(
signalCode,
'process exited via gate, not killed by signal',
).toBe(null);
});

View File

@@ -0,0 +1,145 @@
import { test, expect } from '@playwright/test';
import { listAsar, readAsarFile, resolveAsarPath } from '../lib/asar.js';
// H02 — frame-fix-wrapper presence (file probe).
//
// The wrapper at scripts/frame-fix-wrapper.js is the linchpin of every
// Linux frame fix (close-to-tray, autostart shim, KWin child-bounds
// jiggle, AZERTY Ctrl+Q). It's injected by patch_app_asar in
// scripts/patches/app-asar.sh:18-49: the script copies the wrapper
// into the asar root, writes a frame-fix-entry.js shim that requires
// it, then rewrites package.json's `main` to point at the shim.
//
// If any of those steps silently breaks (missing source file, asar
// pack failure, package.json rewrite skipped), the app reverts to
// upstream's frameless-window behavior on every Linux row and our
// test harness's hook patterns (CLAUDE.md "Hooking Electron")
// stop matching what's loaded. S09 only covers the quick-window
// patch; nothing else asserts the wrapper landed at all.
//
// Three checks, ordered cheapest-first:
// 1. Both files exist in the asar manifest.
// 2. frame-fix-wrapper.js contains `Proxy(` (the Proxy pattern is
// the entire reason the wrapper works — see CLAUDE.md and
// lib/quickentry.ts:75-81).
// 3. frame-fix-entry.js requires the wrapper.
// 4. package.json's `main` references frame-fix-entry (substring,
// not exact, since patches don't always preserve `.js`).
//
// Pure file probe — no app launch. Fast (<1s). Row-independent.
test('H02 — frame-fix-wrapper.js + frame-fix-entry.js injected into app.asar', async ({}, testInfo) => {
testInfo.annotations.push({ type: 'severity', description: 'Critical' });
testInfo.annotations.push({
type: 'surface',
description: 'Frame fix wrapper injection',
});
const asarPath = resolveAsarPath();
await testInfo.attach('asar-path', {
body: asarPath,
contentType: 'text/plain',
});
// 1. Manifest probe. listAsar returns full paths inside the
// archive (e.g. '/frame-fix-wrapper.js' or 'frame-fix-wrapper.js'
// depending on @electron/asar's normalization). Use endsWith
// so either form matches.
const manifest = listAsar(asarPath);
const frameFixFiles = manifest.filter(
(p) =>
p.endsWith('frame-fix-wrapper.js') ||
p.endsWith('frame-fix-entry.js'),
);
const wrapperPresent = frameFixFiles.some((p) =>
p.endsWith('frame-fix-wrapper.js'),
);
const entryPresent = frameFixFiles.some((p) =>
p.endsWith('frame-fix-entry.js'),
);
await testInfo.attach('frame-fix-files', {
body: JSON.stringify(
{
found: frameFixFiles,
wrapperPresent,
entryPresent,
},
null,
2,
),
contentType: 'application/json',
});
expect(
wrapperPresent,
'frame-fix-wrapper.js is present in app.asar manifest',
).toBe(true);
expect(
entryPresent,
'frame-fix-entry.js is present in app.asar manifest',
).toBe(true);
// 2. Wrapper contents — the Proxy pattern is the load-bearing
// structure (see scripts/frame-fix-wrapper.js:491-506 and
// CLAUDE.md "Frame Fix Wrapper" section). A wrapper without
// a Proxy is a stub that doesn't intercept anything.
const wrapper = readAsarFile('frame-fix-wrapper.js', asarPath);
const proxyPresent = wrapper.includes('Proxy(');
expect(
proxyPresent,
'frame-fix-wrapper.js uses the Proxy() pattern (CLAUDE.md "Frame Fix Wrapper")',
).toBe(true);
// 3. Entry shim — it must require the wrapper, otherwise it's
// not actually loading any of the patches.
const entry = readAsarFile('frame-fix-entry.js', asarPath);
const entryRequiresWrapper =
entry.includes("require('./frame-fix-wrapper") ||
entry.includes('require("./frame-fix-wrapper');
expect(
entryRequiresWrapper,
'frame-fix-entry.js requires ./frame-fix-wrapper',
).toBe(true);
// 4. package.json `main` — patch_app_asar in app-asar.sh:40-49
// rewrites pkg.main to 'frame-fix-entry.js'. Substring match
// on 'frame-fix-entry' tolerates patches that re-extension
// or rename the shim.
const pkgJsonRaw = readAsarFile('package.json', asarPath);
let mainEntry = '';
try {
const parsed = JSON.parse(pkgJsonRaw) as { main?: unknown };
if (typeof parsed.main === 'string') mainEntry = parsed.main;
} catch (err) {
throw new Error(
'package.json in app.asar is not valid JSON: ' +
(err instanceof Error ? err.message : String(err)),
);
}
await testInfo.attach('package-main', {
body: JSON.stringify({ main: mainEntry }, null, 2),
contentType: 'application/json',
});
expect(
mainEntry.includes('frame-fix-entry'),
'package.json `main` references frame-fix-entry (app-asar.sh:40-49)',
).toBe(true);
await testInfo.attach('evidence', {
body: JSON.stringify(
{
wrapperPresent,
entryPresent,
proxyPresent,
entryRequiresWrapper,
mainEntry,
},
null,
2,
),
contentType: 'application/json',
});
});

View File

@@ -0,0 +1,161 @@
import { test, expect } from '@playwright/test';
import { readAsarFile, resolveAsarPath } from '../lib/asar.js';
// H03 — build pipeline patch fingerprints (file probe).
//
// scripts/patches/*.sh layers a stack of regex-based mutations onto
// the bundled JS at build time. Each patch lands a distinctive
// string somewhere in the asar; if a patch silently skips (anchor
// regex misses, idempotency guard short-circuits the wrong way,
// build orchestrator drops the call), that string is absent and
// the patch's behavior is gone.
//
// S09 already covers quick-window.sh. This test consolidates the
// rest into one manifest so future drift is observable in a single
// JSON dump. Fingerprints are pinned to STRINGS THE PATCH INJECTS
// (not strings the patch matches against), so an upstream rename
// of the matched site doesn't false-positive a passing patch.
//
// Pure file probe — no app launch. Fast (<1s). Row-independent.
interface PatchEntry {
patch: string;
fingerprint: string;
file: string;
// One-line note explaining where the fingerprint comes from
// in the patch script — surfaced in the attached manifest so
// future maintainers can tie a failure back to the right
// scripts/patches/*.sh:LINE.
source: string;
}
const MANIFEST: PatchEntry[] = [
{
patch: 'quick-window.sh',
fingerprint: 'XDG_CURRENT_DESKTOP',
file: '.vite/build/index.js',
source:
'patches/quick-window.sh injects an XDG_CURRENT_DESKTOP env-var ' +
'gate; same fingerprint S09 asserts.',
},
{
patch: 'app-asar.sh (frame-fix injection)',
fingerprint: 'frame-fix-entry',
file: 'package.json',
source:
'patches/app-asar.sh:40-49 rewrites package.json main to ' +
"'frame-fix-entry.js'.",
},
{
patch: 'tray.sh (startup-delay nativeTheme guard)',
fingerprint: '_trayStartTime',
file: '.vite/build/index.js',
source:
'patches/tray.sh:67-69 injects `let _trayStartTime=Date.now();` ' +
"into the nativeTheme `on('updated')` handler. Variable name " +
'is unique to our patch — upstream never declares it.',
},
{
patch: 'cowork.sh (Linux daemon quit handler)',
fingerprint: 'cowork-linux-daemon-shutdown',
file: '.vite/build/index.js',
source:
'patches/cowork.sh:602-605 registers a Linux-only quit handler ' +
"with name:'cowork-linux-daemon-shutdown'. Distinctive string " +
'unique to the patch.',
},
{
patch: 'claude-code.sh (Linux platform branch)',
fingerprint: 'linux-arm64',
file: '.vite/build/index.js',
source:
'patches/claude-code.sh:20-24 injects `linux-arm64` / `linux-x64` ' +
'platform-bundle branches into getHostPlatform. Upstream throws ' +
'on Linux; the string is absent without the patch.',
},
];
// TODOs intentionally left where a stable fingerprint isn't easy:
// - tray.sh has multiple sub-patches (icon selection, in-place
// update, menu-bar default). _trayStartTime above covers the
// menu-handler patch reliably; the in-place update patch
// anchors on a generated name like `${TRAY_VAR}.setImage(...)`
// where TRAY_VAR is minifier-renamed every release, so no
// fingerprint there is stable enough to assert without a
// second extraction step. Acceptable: the menu-handler
// fingerprint is upstream of the in-place patch in the same
// subsystem, so a missing _trayStartTime implies a much
// bigger build problem anyway.
test('H03 — build pipeline patch fingerprints present in app.asar', async ({}, testInfo) => {
testInfo.annotations.push({ type: 'severity', description: 'Critical' });
testInfo.annotations.push({
type: 'surface',
description: 'Build pipeline patch fingerprints',
});
const asarPath = resolveAsarPath();
await testInfo.attach('asar-path', {
body: asarPath,
contentType: 'text/plain',
});
// Read each unique file once, then check fingerprints against
// the cached contents. Saves repeated asar extraction for
// patches that share a target file.
const fileCache = new Map<string, string>();
const results: {
patch: string;
fingerprint: string;
file: string;
source: string;
found: boolean;
}[] = [];
for (const entry of MANIFEST) {
let contents = fileCache.get(entry.file);
if (contents === undefined) {
try {
contents = readAsarFile(entry.file, asarPath);
fileCache.set(entry.file, contents);
} catch (err) {
// File missing — record as a "not found" result so
// the manifest dump shows the failure shape rather
// than aborting on the first hiccup.
results.push({
patch: entry.patch,
fingerprint: entry.fingerprint,
file: entry.file,
source:
entry.source +
' [READ ERROR: ' +
(err instanceof Error ? err.message : String(err)) +
']',
found: false,
});
continue;
}
}
results.push({
patch: entry.patch,
fingerprint: entry.fingerprint,
file: entry.file,
source: entry.source,
found: contents.includes(entry.fingerprint),
});
}
// Always attach the manifest — passing tests should still
// surface the verified fingerprints so future drift is visible
// without re-running with -v.
await testInfo.attach('patch-manifest', {
body: JSON.stringify(results, null, 2),
contentType: 'application/json',
});
const missing = results.filter((r) => !r.found);
expect(
missing,
'every expected patch fingerprint is present in the bundled app.asar',
).toEqual([]);
});

View File

@@ -0,0 +1,205 @@
import { test, expect } from '@playwright/test';
import { execFile } from 'node:child_process';
import { promisify } from 'node:util';
import { launchClaude } from '../lib/electron.js';
import { skipUnlessRow } from '../lib/row.js';
import { sleep } from '../lib/retry.js';
import { captureSessionEnv } from '../lib/diagnostics.js';
const exec = promisify(execFile);
// H04 — cowork daemon spawn / cleanup contract.
//
// docs/learnings/cowork-vm-daemon.md describes the contract that
// patches/cowork.sh implements: the app's auto-launch path
// (cowork.sh:262-362) forks cowork-vm-service.js as a detached
// child on first VM-service connection attempt, and the Linux
// quit handler registered at cowork.sh:584-633 SIGTERMs that
// daemon on app exit. No existing test asserts that contract
// end-to-end. If the auto-launch regresses, the app falls back
// to "VM service not running" errors silently; if the quit
// handler regresses, daemons leak across app sessions and
// pollute the next launch's socket binding.
//
// Shape: pgrep baseline (must be empty after launchClaude's
// cleanupPreLaunch — see lib/electron.ts:160-191), launch with
// isolation, wait for mainVisible, poll for a daemon pid, then
// close + verify cleanup.
//
// The daemon spawn is conditional — cowork.sh:265 anchors on
// 'VM service not running. The service failed to start.' which
// only fires when something in the renderer triggers a VM
// connection. On a freshly-launched app that never hits the
// Cowork tab, the daemon may legitimately not appear within
// the budget. Treat that as `testInfo.skip` rather than a fail.
//
// Row-gated to the same set as the QE tests — daemon is a Linux
// thing, gating mirrors S30.
const PGREP_PATTERN = 'cowork-vm-service\\.js';
async function pgrepPids(pattern: string): Promise<Set<number>> {
try {
const { stdout } = await exec('pgrep', ['-f', pattern], {
timeout: 5_000,
});
return new Set(
stdout
.split('\n')
.map((l) => parseInt(l.trim(), 10))
.filter((n) => !Number.isNaN(n)),
);
} catch (err) {
// pgrep exits 1 with empty stdout when no matches. Treat as
// the empty set; everything else propagates.
const e = err as { code?: number; stdout?: string };
if (e.code === 1) return new Set();
const out = e.stdout ?? '';
return new Set(
out
.split('\n')
.map((l) => parseInt(l.trim(), 10))
.filter((n) => !Number.isNaN(n)),
);
}
}
test.setTimeout(60_000);
test('H04 — cowork daemon spawns under app, exits with app', async ({}, testInfo) => {
testInfo.annotations.push({ type: 'severity', description: 'Should' });
testInfo.annotations.push({
type: 'surface',
description: 'Cowork daemon lifecycle',
});
skipUnlessRow(testInfo, ['KDE-W', 'GNOME-W', 'Ubu-W', 'KDE-X', 'GNOME-X']);
await testInfo.attach('session-env', {
body: JSON.stringify(captureSessionEnv(), null, 2),
contentType: 'application/json',
});
// Baseline — launchClaude's cleanupPreLaunch (lib/electron.ts:160-191)
// pkills any leftover cowork daemon before spawning, so a stray
// pid here would mean the cleanup itself is broken.
const baselinePids = await pgrepPids(PGREP_PATTERN);
await testInfo.attach('baseline-pids', {
body: JSON.stringify(
{
pids: Array.from(baselinePids),
note:
'cleanupPreLaunch should leave this empty before launch. ' +
'Non-empty here is a bug in lib/electron.ts:160-191.',
},
null,
2,
),
contentType: 'application/json',
});
const useHostConfig = process.env.CLAUDE_TEST_USE_HOST_CONFIG === '1';
const app = await launchClaude({
isolation: useHostConfig ? null : undefined,
});
let daemonPid: number | null = null;
let lingeringPids: number[] = [];
try {
// mainVisible — main shell up; the daemon spawn is gated on
// renderer activity (cowork.sh:262-362) which can begin
// asynchronously after the shell paints. Lower readiness
// levels race the spawn window.
await app.waitForReady('mainVisible');
// Poll up to 15s for a new daemon pid. cowork.sh's auto-
// launch only fires when the renderer attempts a VM service
// connection; on a passive launch (no Cowork tab interaction)
// the daemon may legitimately not appear in this window.
const start = Date.now();
while (Date.now() - start < 15_000) {
const pids = await pgrepPids(PGREP_PATTERN);
const newPids = Array.from(pids).filter(
(p) => !baselinePids.has(p),
);
if (newPids.length > 0) {
daemonPid = newPids[0]!;
break;
}
await sleep(500);
}
if (daemonPid === null) {
await testInfo.attach('skip-reason', {
body: JSON.stringify(
{
reason:
'cowork daemon not spawned within 15s of mainVisible',
note:
'Auto-launch in cowork.sh:262-362 is gated on a VM ' +
'service connection attempt from the renderer; on a ' +
'passive launch with no Cowork-tab interaction it may ' +
'legitimately not fire. Not a regression on its own.',
},
null,
2,
),
contentType: 'application/json',
});
testInfo.skip(
true,
'cowork daemon not spawned by this build — gating in ' +
'cowork.sh:262-362 may have suppressed it on a passive launch',
);
return;
}
await testInfo.attach('daemon-spawned', {
body: JSON.stringify(
{
pid: daemonPid,
elapsedMs: Date.now() - start,
},
null,
2,
),
contentType: 'application/json',
});
} finally {
await app.close();
}
// Quit handler (cowork.sh:584-633) waits up to 10s for the
// daemon to exit after SIGTERM. Give it a 5s settle window —
// graceful exit is the common case, but on a slow runner the
// kill loop's poll cadence (200ms × 50) can stretch. Re-pgrep
// after the wait.
await sleep(5_000);
const postExitPids = await pgrepPids(PGREP_PATTERN);
lingeringPids = Array.from(postExitPids).filter(
(p) => p === daemonPid || !baselinePids.has(p),
);
await testInfo.attach('post-exit-pgrep', {
body: JSON.stringify(
{
baseline: Array.from(baselinePids),
postExit: Array.from(postExitPids),
lingering: lingeringPids,
note:
'Lingering daemon pids after app.close() indicate the ' +
'Linux quit handler in cowork.sh:584-633 did not run, ' +
'or its 10s SIGTERM-then-noop loop completed without ' +
'the daemon actually exiting (escalate to SIGKILL upstream).',
},
null,
2,
),
contentType: 'application/json',
});
expect(
lingeringPids,
'no cowork-vm-service daemon lingers 5s after app.close()',
).toEqual([]);
});

View File

@@ -0,0 +1,399 @@
import { test, expect } from '@playwright/test';
import { readdirSync, readFileSync, existsSync } from 'node:fs';
import { dirname, resolve } from 'node:path';
import { fileURLToPath } from 'node:url';
import { launchClaude } from '../lib/electron.js';
import { sleep } from '../lib/retry.js';
import { captureSessionEnv } from '../lib/diagnostics.js';
import { capture, type Snapshot } from '../../explore/snapshot.js';
import { diff } from '../../explore/diff.js';
import type { InspectorClient } from '../lib/inspector.js';
// H05 — claude.ai UI drift detection.
//
// docs/testing/claudeai-ui-mapping-plan.md Phase 5: catch upstream
// renderer changes that would break the page-object selectors in
// lib/claudeai.ts BEFORE they fail a real spec mid-sweep.
//
// For each baseline JSON under docs/testing/ui-snapshots/:
// 1. Navigate the renderer to the captured claudeAiUrl (if any).
// 2. Capture a fresh Snapshot via the same `capture()` the explore
// CLI uses — no forked logic.
// 3. Compare against the baseline via the same `diff()` the explore
// CLI uses. Attach the per-snapshot diff if non-empty.
// 4. A snapshot is "clean" if `diff(...).entries.length === 0`.
//
// Pass criterion: ≥80% of snapshots clean (per the plan). The
// threshold is forgiving on purpose — a single rendered surface
// shifting class names shouldn't block CI; we want a signal, not a
// blast radius.
//
// Per-snapshot timing target ≤200ms (snapshot capture only — the
// 30s navigation settle is excluded). Exceedance is a soft warning
// surfaced via attachment, never a hard fail.
//
// Skip behaviours:
// - Zero baselines: skip with the "capture some first" message
// (the directory is gitignored beyond .gitkeep + README, so a
// fresh checkout legitimately has none).
// - Not signed in (no claude.ai webContents at the claudeAi
// readiness level): skip — most baselines target post-login
// surfaces and would fail spuriously on /login.
//
// Row-gated to the same set as the QE-driven specs since the host
// must be capable of reaching claude.ai under launchClaude.
const SNAPSHOT_DIR = resolve(
dirname(fileURLToPath(import.meta.url)),
'..',
'..',
'..',
'..',
'docs',
'testing',
'ui-snapshots',
);
// 200ms is the per-snapshot capture target from the plan. Surface
// (not enforce) when a single capture exceeds this.
const CAPTURE_BUDGET_MS = 200;
// 80% from the plan — pass if at least this fraction of snapshots
// have zero diffs. Computed as floor(N * 0.8) so 5/5 passes, 4/5
// passes, 3/5 fails, etc.
const CLEAN_FRACTION_REQUIRED = 0.8;
// Navigation settle: after setting location.href, we poll for the
// URL to land + readyState to reach 'complete' before snapshotting.
// Coupled to the renderer route load + auth-gate redirect time;
// 30s is the same upper bound used by waitForReady('claudeAi').
const NAV_SETTLE_TIMEOUT_MS = 30_000;
const NAV_SETTLE_INTERVAL_MS = 500;
interface SnapshotFile {
name: string;
path: string;
baseline: Snapshot;
}
interface PerSnapshotResult {
name: string;
url: string | null;
clean: boolean;
captureMs: number;
summary: { removed: number; changed: number; added: number };
skipped?: string;
error?: string;
}
function loadBaselines(): SnapshotFile[] {
if (!existsSync(SNAPSHOT_DIR)) return [];
const files = readdirSync(SNAPSHOT_DIR).filter((f) => f.endsWith('.json'));
const out: SnapshotFile[] = [];
for (const file of files) {
const path = resolve(SNAPSHOT_DIR, file);
const raw = readFileSync(path, 'utf8');
try {
out.push({
name: file.replace(/\.json$/, ''),
path,
baseline: JSON.parse(raw) as Snapshot,
});
} catch (err) {
// Surface the bad file as a skipped result rather than
// aborting the whole run — one corrupt baseline shouldn't
// hide drift in the rest.
out.push({
name: file.replace(/\.json$/, ''),
path,
baseline: {
capturedAt: '',
claudeAiUrl: '',
appVersion: null,
pageState: { url: '', title: '', readyState: '' },
dfPills: [],
compactPills: [],
ariaLabeledButtons: [],
openMenu: null,
modals: [],
},
});
// Stash the parse error on the file object via a side
// channel: the spec body checks for an empty capturedAt
// on the baseline as the "load failed" signal.
(out[out.length - 1] as { _loadError?: string })._loadError =
err instanceof Error ? err.message : String(err);
}
}
return out;
}
// Drive the active claude.ai webContents to the target URL. We set
// location.href in the renderer rather than calling webContents.loadURL
// from main: setting from the renderer keeps the React app's history
// stack intact (it's the same pathway a user-initiated navigation
// takes), avoiding the "blank window then re-mount" flicker loadURL
// triggers. Then poll for the URL to land and readyState=='complete'.
async function navigateRendererTo(
inspector: InspectorClient,
targetUrl: string,
): Promise<void> {
await inspector.evalInRenderer<null>(
'claude.ai',
`(() => { window.location.href = ${JSON.stringify(targetUrl)}; return null; })()`,
);
const start = Date.now();
while (Date.now() - start < NAV_SETTLE_TIMEOUT_MS) {
try {
const state = await inspector.evalInRenderer<{
url: string;
readyState: string;
}>(
'claude.ai',
`(() => ({ url: location.href, readyState: document.readyState }))()`,
);
if (
state.readyState === 'complete' &&
sameOrigin(state.url, targetUrl)
) {
// One extra tick to let claude.ai's React render finish
// — readyState='complete' fires before the SPA mounts.
await sleep(500);
return;
}
} catch {
// During navigation the webContents URL changes and the
// 'claude.ai' filter may transiently miss; just retry.
}
await sleep(NAV_SETTLE_INTERVAL_MS);
}
throw new Error(
`renderer did not settle on ${targetUrl} within ${NAV_SETTLE_TIMEOUT_MS}ms`,
);
}
// Compare URLs by origin + pathname. claude.ai tacks on tracking
// params, modal state, etc. to the URL after route resolution, so an
// exact match is too strict; the route is what we care about.
function sameOrigin(a: string, b: string): boolean {
try {
const ua = new URL(a);
const ub = new URL(b);
return ua.origin === ub.origin && ua.pathname === ub.pathname;
} catch {
return a === b;
}
}
test.setTimeout(180_000);
test('H05 — claude.ai UI drift detection', async ({}, testInfo) => {
testInfo.annotations.push({ type: 'severity', description: 'Should' });
testInfo.annotations.push({
type: 'surface',
description: 'claude.ai UI drift detection',
});
await testInfo.attach('session-env', {
body: JSON.stringify(captureSessionEnv(), null, 2),
contentType: 'application/json',
});
const baselines = loadBaselines();
await testInfo.attach('baselines-found', {
body: JSON.stringify(
{
dir: SNAPSHOT_DIR,
count: baselines.length,
names: baselines.map((b) => b.name),
},
null,
2,
),
contentType: 'application/json',
});
if (baselines.length === 0) {
testInfo.skip(
true,
'no baselines under docs/testing/ui-snapshots/ — capture some ' +
'with `npm run explore:snapshot <name>` first',
);
return;
}
const useHostConfig = process.env.CLAUDE_TEST_USE_HOST_CONFIG === '1';
const app = await launchClaude({
isolation: useHostConfig ? null : undefined,
});
const results: PerSnapshotResult[] = [];
try {
// claudeAi level: a claude.ai webContents exists. We don't
// require userLoaded here because some baselines might
// legitimately be of /login surfaces; per-snapshot navigation
// will land us on whatever the baseline captured.
const { inspector, claudeAiUrl } = await app.waitForReady('claudeAi');
if (!claudeAiUrl) {
testInfo.skip(
true,
'claude.ai webContents never loaded — likely not signed in. ' +
'Set CLAUDE_TEST_USE_HOST_CONFIG=1 to share host config.',
);
return;
}
await testInfo.attach('initial-claude-ai-url', {
body: claudeAiUrl,
contentType: 'text/plain',
});
for (const file of baselines) {
const loadError = (file as { _loadError?: string })._loadError;
if (loadError) {
results.push({
name: file.name,
url: null,
clean: false,
captureMs: 0,
summary: { removed: 0, changed: 0, added: 0 },
error: `failed to parse baseline: ${loadError}`,
});
await testInfo.attach(`drift-${file.name}.json`, {
body: JSON.stringify({ error: loadError }, null, 2),
contentType: 'application/json',
});
continue;
}
const targetUrl = file.baseline.claudeAiUrl;
// Navigate (best-effort). If a baseline has no URL,
// snapshot the current renderer state in place — it
// matches the explore CLI's bare `snapshot <name>`
// pathway, which captures wherever the app is sitting.
if (targetUrl) {
try {
await navigateRendererTo(inspector, targetUrl);
} catch (err) {
results.push({
name: file.name,
url: targetUrl,
clean: false,
captureMs: 0,
summary: { removed: 0, changed: 0, added: 0 },
error: `navigation failed: ${err instanceof Error ? err.message : String(err)}`,
});
continue;
}
}
const captureStart = Date.now();
let fresh: Snapshot;
try {
fresh = await capture(inspector);
} catch (err) {
results.push({
name: file.name,
url: targetUrl || null,
clean: false,
captureMs: Date.now() - captureStart,
summary: { removed: 0, changed: 0, added: 0 },
error: `capture failed: ${err instanceof Error ? err.message : String(err)}`,
});
continue;
}
const captureMs = Date.now() - captureStart;
const result = diff(file.baseline, fresh);
const clean = result.entries.length === 0;
results.push({
name: file.name,
url: targetUrl || null,
clean,
captureMs,
summary: result.summary,
});
// Always attach the diff payload — clean diffs are the
// "no entries" case and confirm the snapshot was actually
// compared (vs. silently skipped). Naming per-snapshot so
// the report shows them side-by-side.
await testInfo.attach(`drift-${file.name}.json`, {
body: JSON.stringify(result, null, 2),
contentType: 'application/json',
});
}
inspector.close();
} finally {
await app.close();
}
const cleanCount = results.filter((r) => r.clean).length;
const totalCount = results.length;
const cleanFraction = totalCount === 0 ? 0 : cleanCount / totalCount;
const slowSnapshots = results.filter(
(r) => r.captureMs > CAPTURE_BUDGET_MS,
);
const errored = results.filter((r) => r.error);
await testInfo.attach('drift-summary', {
body: JSON.stringify(
{
totalCount,
cleanCount,
cleanFraction,
thresholdRequired: CLEAN_FRACTION_REQUIRED,
results,
slowSnapshots: slowSnapshots.map((r) => ({
name: r.name,
captureMs: r.captureMs,
budgetMs: CAPTURE_BUDGET_MS,
})),
erroredSnapshots: errored.map((r) => ({
name: r.name,
error: r.error,
})),
},
null,
2,
),
contentType: 'application/json',
});
if (slowSnapshots.length > 0) {
// Soft warning only — surface as an attachment, don't fail.
// Capture latency is bounded by the renderer's main-thread
// availability, which is noisy. The plan's 200ms is a
// "looking-good" target, not a contract.
await testInfo.attach('slow-capture-warning', {
body: JSON.stringify(
{
note:
`${slowSnapshots.length} snapshot(s) exceeded the ` +
`${CAPTURE_BUDGET_MS}ms capture target. Soft warning — ` +
'not a fail. Investigate if this trends upward.',
snapshots: slowSnapshots.map((r) => ({
name: r.name,
captureMs: r.captureMs,
})),
},
null,
2,
),
contentType: 'application/json',
});
}
expect(
cleanFraction,
`at least ${Math.round(CLEAN_FRACTION_REQUIRED * 100)}% of snapshots ` +
`must have zero diffs (got ${cleanCount}/${totalCount} clean — see ` +
'drift-*.json attachments for per-snapshot diffs)',
).toBeGreaterThanOrEqual(CLEAN_FRACTION_REQUIRED);
});

View File

@@ -0,0 +1,356 @@
import { test, expect } from '@playwright/test';
import { spawn, execFile } from 'node:child_process';
import { existsSync, statSync } from 'node:fs';
import { open } from 'node:fs/promises';
import { promisify } from 'node:util';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
import { mkdtemp, rm } from 'node:fs/promises';
const exec = promisify(execFile);
// S01 — AppImage launches without manual `libfuse2t64` install.
//
// Per docs/testing/cases/distribution.md S01: on Ubuntu 24.04+ the
// project AppImage currently fails with `dlopen(): error loading
// libfuse.so.2` unless the user manually installs `libfuse2t64`.
// The case-doc anchor (scripts/packaging/appimage.sh:226) notes the
// upstream `appimagetool` runtime is bundled as-is — no FUSE shim,
// no postinst dep declaration, no clear error message. CI papers
// over the gap by `apt install libfuse2`-ing before exec
// (.github/workflows/test-artifacts.yml:47).
//
// Assertion shape:
// 1. Locate an AppImage. Skip cleanly if not running from one.
// 2. Spawn the AppImage with a brief grace window. Capture stderr.
// 3. Assert stderr does NOT contain `libfuse.so.2` (or the broader
// `dlopen` failure pattern that the AppImage runtime emits when
// FUSE is missing).
// 4. Kill the proc — we don't need a full launch, just the FUSE
// load attempt which happens before any squashfs mount.
//
// Why a runtime spawn rather than a static probe: the failure mode
// is `dlopen()` of libfuse.so.2 inside the AppImage runtime ELF
// itself, not anything our scripts produce. Only a real spawn on
// the target host exercises that dynamic loader path.
//
// Approach choice: we do NOT use `--appimage-version`. That flag is
// handled by the AppImage runtime BEFORE any FUSE mount, so it
// would exit 0 even on a host missing libfuse2 and silently pass
// the test. Instead we let the runtime reach its mount step, watch
// stderr for the dlopen error (which fires within ~100ms when the
// lib is absent), then kill before the Electron child has a chance
// to persist anything.
//
// Isolation: we spawn with a temp `XDG_CONFIG_HOME` / `HOME`-adjacent
// override so even if Electron does come up briefly before we kill
// it, nothing lands in `~/.config/Claude`.
//
// Row gating: this isn't matrix-row-driven — it's install-method-
// driven. The harness's `ROW` env doesn't carry "is this row's
// install an AppImage?", so we detect at runtime via launcher path
// + magic-byte sniff. Skip when the local install isn't AppImage.
interface AppImageProbeResult {
path: string | null;
reason: string;
}
// AppImages are ELF executables containing a squashfs image with a
// magic header at offset 8: `AI\x02` for type 2 (the format our build
// emits) or `AI\x01` for type 1. The magic is also visible to `file`,
// but ELF + extension + magic is cheap enough to inline.
async function probeAppImagePath(): Promise<AppImageProbeResult> {
const explicit = process.env.CLAUDE_DESKTOP_LAUNCHER;
const candidates: string[] = [];
if (explicit) candidates.push(explicit);
// Fallback search: project test-build dir holds AppImages from
// `./build.sh --build appimage`. Resolve relative to this spec
// so the search works regardless of CWD.
const projectRoot = '/home/aaddrick/source/claude-desktop-debian';
const testBuildDir = `${projectRoot}/test-build`;
if (existsSync(testBuildDir)) {
try {
const fs = await import('node:fs/promises');
const entries = await fs.readdir(testBuildDir);
for (const entry of entries) {
if (entry.endsWith('.AppImage')) {
candidates.push(`${testBuildDir}/${entry}`);
}
}
} catch {
// best-effort
}
}
for (const candidate of candidates) {
if (!existsSync(candidate)) continue;
try {
const st = statSync(candidate);
if (!st.isFile()) continue;
// Quick filename hint: skip the magic-byte read entirely
// for unambiguous .AppImage suffixes.
if (candidate.endsWith('.AppImage')) {
return { path: candidate, reason: 'matched .AppImage suffix' };
}
// Magic-byte sniff: ELF (`\x7fELF`) at offset 0, AppImage
// type marker `AI\x02` at offset 8.
const fh = await open(candidate, 'r');
try {
const buf = Buffer.alloc(12);
await fh.read(buf, 0, 12, 0);
const elf = buf.subarray(0, 4).toString('hex') === '7f454c46';
const aiMagic = buf.subarray(8, 11);
const isAppImage =
elf &&
aiMagic[0] === 0x41 &&
aiMagic[1] === 0x49 &&
(aiMagic[2] === 0x01 || aiMagic[2] === 0x02);
if (isAppImage) {
return {
path: candidate,
reason: 'matched AppImage magic bytes',
};
}
} finally {
await fh.close();
}
} catch {
// fall through to next candidate
}
}
return {
path: null,
reason:
'no AppImage found via CLAUDE_DESKTOP_LAUNCHER or ' +
`${testBuildDir}/*.AppImage`,
};
}
async function captureFuseDpkg(): Promise<string> {
// Best-effort context capture for the case-doc's listed
// "Diagnostics on failure". `dpkg -l` is Debian-only — we still
// run it and let it fail cleanly on RPM hosts (the empty/error
// output is itself diagnostic).
try {
const { stdout, stderr } = await exec(
'sh',
['-c', 'dpkg -l 2>&1 | grep -i fuse || true'],
{ timeout: 5_000 },
);
return `${stdout}${stderr}`.trim() || '(no fuse-related dpkg entries)';
} catch (err) {
const e = err as { stdout?: string; stderr?: string; code?: number };
return (
`dpkg query failed (exit ${e.code ?? '?'})\n` +
`${(e.stdout ?? '').trim()}\n` +
`${(e.stderr ?? '').trim()}`
).trim();
}
}
// Matches the dlopen failure pattern the AppImage runtime prints
// when libfuse2 is missing. The case-doc lists `libfuse.so.2` as the
// canonical token; we also flag the broader `dlopen` + `fuse`
// combination so a future runtime that changes the wording without
// fixing the underlying bug still trips the test.
function fuseFailureFound(stderr: string): { found: boolean; match?: string } {
const lower = stderr.toLowerCase();
if (lower.includes('libfuse.so.2')) {
return { found: true, match: 'libfuse.so.2' };
}
// Both 'dlopen' and 'fuse' on the same line of stderr — wider net
// for future-proofing.
for (const line of stderr.split('\n')) {
const ll = line.toLowerCase();
if (ll.includes('dlopen') && ll.includes('fuse')) {
return { found: true, match: line.trim() };
}
}
return { found: false };
}
test.setTimeout(30_000);
test('S01 — AppImage launches without manual libfuse2t64', async ({}, testInfo) => {
testInfo.annotations.push({ type: 'severity', description: 'Critical' });
testInfo.annotations.push({
type: 'surface',
description: 'Distribution / AppImage',
});
const probe = await probeAppImagePath();
await testInfo.attach('appimage-probe', {
body: JSON.stringify(probe, null, 2),
contentType: 'application/json',
});
if (!probe.path) {
test.skip(true, `S01 only applies to AppImage installs: ${probe.reason}`);
return;
}
const appimagePath = probe.path;
// Always-on context: dpkg fuse state. Cheap, useful for triage
// regardless of pass/fail.
const dpkgFuse = await captureFuseDpkg();
await testInfo.attach('dpkg-fuse', {
body: dpkgFuse,
contentType: 'text/plain',
});
// Per-test sandbox so a brief Electron child doesn't pollute the
// host's ~/.config/Claude. We don't use launchClaude()'s isolation
// because it spawns the bundled Electron directly (bypassing the
// AppImage runtime's FUSE mount, which is exactly what we're
// trying to exercise here).
const sandboxRoot = await mkdtemp(join(tmpdir(), 'claude-s01-'));
const sandboxConfig = join(sandboxRoot, 'config');
const sandboxHome = join(sandboxRoot, 'home');
let exitCode: number | null = null;
let signalCode: NodeJS.Signals | null = null;
let timedOutBeforeFuseSignal = false;
const stderrChunks: Buffer[] = [];
const stdoutChunks: Buffer[] = [];
const start = Date.now();
try {
const proc = spawn(appimagePath, [], {
cwd: sandboxRoot,
env: {
...process.env,
HOME: sandboxHome,
XDG_CONFIG_HOME: sandboxConfig,
XDG_DATA_HOME: join(sandboxRoot, 'data'),
XDG_CACHE_HOME: join(sandboxRoot, 'cache'),
// Surface FUSE mount errors loudly; the AppImage runtime
// honours this for its diagnostic output.
APPIMAGE_DEBUG: '1',
},
stdio: ['ignore', 'pipe', 'pipe'],
detached: false,
});
proc.stderr?.on('data', (chunk: Buffer) => stderrChunks.push(chunk));
proc.stdout?.on('data', (chunk: Buffer) => stdoutChunks.push(chunk));
// Race three outcomes:
// (a) process exits on its own (FUSE failure exits ~100-300ms)
// (b) we observed a FUSE error in stderr — kill early
// (c) timeout: app probably mounted fine and is starting up,
// in which case absence of FUSE error in stderr is a PASS
const fuseSignal = new Promise<'fuse-error'>((resolve) => {
const checkInterval = setInterval(() => {
const so_far = Buffer.concat(stderrChunks).toString('utf8');
if (fuseFailureFound(so_far).found) {
clearInterval(checkInterval);
resolve('fuse-error');
}
}, 100);
proc.once('exit', () => clearInterval(checkInterval));
});
const exitSignal = new Promise<'exit'>((resolve) => {
proc.once('exit', (code, signal) => {
exitCode = code;
signalCode = signal;
resolve('exit');
});
});
const timeoutSignal = new Promise<'timeout'>((resolve) => {
setTimeout(() => {
timedOutBeforeFuseSignal = true;
resolve('timeout');
}, 8_000);
});
const winner = await Promise.race([
fuseSignal,
exitSignal,
timeoutSignal,
]);
// Whatever happened, kill the process so we don't leave
// Electron running. SIGTERM first, SIGKILL backstop.
if (proc.exitCode === null && proc.signalCode === null) {
proc.kill('SIGTERM');
await Promise.race([
new Promise<void>((resolve) =>
proc.once('exit', (code, signal) => {
exitCode = code;
signalCode = signal;
resolve();
}),
),
new Promise<void>((resolve) => setTimeout(resolve, 3_000)),
]);
if (proc.exitCode === null && proc.signalCode === null) {
proc.kill('SIGKILL');
await new Promise<void>((resolve) => {
proc.once('exit', (code, signal) => {
exitCode = code;
signalCode = signal;
resolve();
});
setTimeout(() => resolve(), 2_000);
});
}
}
await testInfo.attach('race-winner', {
body: winner,
contentType: 'text/plain',
});
} finally {
await rm(sandboxRoot, { recursive: true, force: true }).catch(() => {});
}
const elapsedMs = Date.now() - start;
const stderrFull = Buffer.concat(stderrChunks).toString('utf8');
const stdoutFull = Buffer.concat(stdoutChunks).toString('utf8');
const stderrTail =
stderrFull.length > 4096 ? stderrFull.slice(-4096) : stderrFull;
const stdoutTail =
stdoutFull.length > 4096 ? stdoutFull.slice(-4096) : stdoutFull;
const fuseCheck = fuseFailureFound(stderrFull);
await testInfo.attach('appimage-path', {
body: appimagePath,
contentType: 'text/plain',
});
await testInfo.attach('exit-info', {
body: JSON.stringify(
{
exitCode,
signalCode,
timedOutBeforeFuseSignal,
elapsedMs,
fuseFailureMatch: fuseCheck.match ?? null,
},
null,
2,
),
contentType: 'application/json',
});
await testInfo.attach('stderr-tail-4k', {
body: stderrTail || '(empty)',
contentType: 'text/plain',
});
await testInfo.attach('stdout-tail-4k', {
body: stdoutTail || '(empty)',
contentType: 'text/plain',
});
expect(
fuseCheck.found,
`AppImage stderr should not report a libfuse.so.2 dlopen failure ` +
`(matched: ${fuseCheck.match ?? 'n/a'}). The case-doc S01 ` +
`scenario fails on Ubuntu 24.04 unless libfuse2t64 is manually ` +
`installed; see scripts/packaging/appimage.sh:226 for the ` +
`upstream-runtime-as-is build choice.`,
).toBe(false);
});

View File

@@ -0,0 +1,184 @@
import { test, expect } from '@playwright/test';
import { existsSync, readFileSync } from 'node:fs';
import { join, resolve } from 'node:path';
// S02 — XDG_CURRENT_DESKTOP detection uses substring match.
//
// Backs S02 in docs/testing/cases/distribution.md.
//
// Ubuntu sets XDG_CURRENT_DESKTOP=ubuntu:GNOME (colon-separated,
// distro-prefixed). A naive `== "GNOME"` (or POSIX `= "GNOME"`)
// equality check misses Ubuntu and silently disables every DE-gated
// branch on those rows. The expected pattern is a substring/glob
// match (case-insensitive) over the colon-separated value:
//
// launcher-common.sh:38-44 → desktop="${XDG_CURRENT_DESKTOP,,}"
// [[ "$desktop" == *niri* ]]
// quick-window.sh:34-35 → (process.env.XDG_CURRENT_DESKTOP||"")
// .toLowerCase().includes("kde")
// quick-window.sh:117-118 → same shape, injected into index.js
//
// This is a source-tree regression detector: if a future change
// rewrites either gate to a strict-equality form, the runner trips.
// It does NOT assert the presence of any specific good pattern (the
// case doc anchors describe several different shapes — niri glob,
// KDE includes(), runtime JS gate); it asserts the *absence* of the
// bad ones.
//
// Pure file probe — no app launch. Fast (<1s). Row-independent.
//
// Path resolution probes, in order:
// 1. $CLAUDE_DESKTOP_REPO_ROOT/scripts (override)
// 2. ../../scripts relative to cwd (dev worktree, where the harness
// runs from tools/test-harness/)
// 3. /usr/lib/claude-desktop/scripts (deb/rpm install layout)
// If none resolve, the test skips with a reason.
interface BadHit {
file: string;
line: number;
text: string;
}
function resolveScriptsDir(): string | null {
const env = process.env.CLAUDE_DESKTOP_REPO_ROOT;
if (env) {
const p = join(env, 'scripts');
if (
existsSync(join(p, 'launcher-common.sh')) &&
existsSync(join(p, 'patches', 'quick-window.sh'))
) {
return p;
}
}
// Dev worktree probe — tools/test-harness lives two dirs deep,
// so cwd/../../scripts is the repo's scripts/ when tests are run
// from tools/test-harness/.
const devProbe = resolve(process.cwd(), '..', '..', 'scripts');
if (
existsSync(join(devProbe, 'launcher-common.sh')) &&
existsSync(join(devProbe, 'patches', 'quick-window.sh'))
) {
return devProbe;
}
// Installed path (deb/rpm).
const installedProbe = '/usr/lib/claude-desktop/scripts';
if (
existsSync(join(installedProbe, 'launcher-common.sh')) &&
existsSync(join(installedProbe, 'patches', 'quick-window.sh'))
) {
return installedProbe;
}
return null;
}
// Bad patterns: shell + JS strict-equality forms against
// XDG_CURRENT_DESKTOP. Each regex is intentionally narrow so the
// expected substring/glob shapes don't false-positive:
//
// - Shell `[[ "$XDG_CURRENT_DESKTOP" == "GNOME" ]]` — bash strict
// equality with a *literal* RHS (no glob `*`). The `*niri*`
// glob form is fine and must NOT match.
// - Shell `[ "$XDG_CURRENT_DESKTOP" = "GNOME" ]` — POSIX strict
// equality.
// - JS `process.env.XDG_CURRENT_DESKTOP === "GNOME"` (and `==`).
//
// Each regex captures the variable on either side of the operator
// so `"GNOME" == "$XDG_CURRENT_DESKTOP"` is also caught.
//
// `lowered` form (`"${XDG_CURRENT_DESKTOP,,}" == *niri*`) uses a
// glob and is allowed; the bad-RHS regexes require the literal to
// have no `*` wildcards inside the quotes.
const BAD_PATTERNS: { name: string; re: RegExp }[] = [
{
// bash [[ ... == "literal" ]] with XDG_CURRENT_DESKTOP on
// either side. RHS literal contains no `*` (glob-free).
name: 'bash [[ == ]] strict equality (no glob)',
re: /\[\[[^\]]*\$\{?XDG_CURRENT_DESKTOP[^\]]*==\s*"[^"*]*"[^\]]*\]\]/,
},
{
name: 'bash [[ == ]] strict equality, var on right (no glob)',
re: /\[\[[^\]]*==\s*"\$\{?XDG_CURRENT_DESKTOP[^\]]*\]\]/,
},
{
// POSIX [ ... = "literal" ] with XDG_CURRENT_DESKTOP.
name: 'POSIX [ = ] strict equality',
re: /\[\s+[^]]*\$\{?XDG_CURRENT_DESKTOP[^\]]*=\s*"[^"]*"[^\]]*\]/,
},
{
// JS strict equality (=== or ==) against a string literal.
// Either single or double quotes; either side of the operator.
name: 'JS === / == strict equality',
re: /process\.env\.XDG_CURRENT_DESKTOP\s*===?\s*['"][^'"]*['"]|['"][^'"]*['"]\s*===?\s*process\.env\.XDG_CURRENT_DESKTOP/,
},
];
function scanFile(absPath: string): BadHit[] {
const text = readFileSync(absPath, 'utf8');
const lines = text.split('\n');
const hits: BadHit[] = [];
for (let i = 0; i < lines.length; i++) {
const line = lines[i] ?? '';
// Cheap pre-filter: only check lines mentioning the env var.
if (!line.includes('XDG_CURRENT_DESKTOP')) continue;
for (const { re } of BAD_PATTERNS) {
if (re.test(line)) {
hits.push({
file: absPath,
line: i + 1,
text: line.length > 200 ? line.slice(0, 200) + '…' : line,
});
break;
}
}
}
return hits;
}
test('S02 — XDG_CURRENT_DESKTOP detection uses substring match, not strict ==', async ({}, testInfo) => {
testInfo.annotations.push({ type: 'severity', description: 'Should' });
testInfo.annotations.push({
type: 'surface',
description: 'Distribution / desktop detection',
});
const scriptsDir = resolveScriptsDir();
if (!scriptsDir) {
test.skip(
true,
'No accessible scripts/ dir (set CLAUDE_DESKTOP_REPO_ROOT or install deb/rpm)',
);
return;
}
await testInfo.attach('scripts-dir', {
body: scriptsDir,
contentType: 'text/plain',
});
const targets = [
join(scriptsDir, 'launcher-common.sh'),
join(scriptsDir, 'patches', 'quick-window.sh'),
];
await testInfo.attach('files-checked', {
body: JSON.stringify(targets, null, 2),
contentType: 'application/json',
});
const allHits: BadHit[] = [];
for (const t of targets) {
allHits.push(...scanFile(t));
}
await testInfo.attach('bad-pattern-hits', {
body: JSON.stringify(allHits, null, 2),
contentType: 'application/json',
});
expect(
allHits,
// eslint-disable-next-line max-len
'No strict-equality checks against XDG_CURRENT_DESKTOP — ubuntu:GNOME would miss them. Use substring/glob match (case-insensitive) instead.',
).toEqual([]);
});

View File

@@ -0,0 +1,157 @@
import { test, expect } from '@playwright/test';
import { execFile } from 'node:child_process';
import { promisify } from 'node:util';
import { captureSessionEnv } from '../lib/diagnostics.js';
const exec = promisify(execFile);
// S03 — DEB control file declares runtime dependencies.
//
// Per docs/testing/cases/distribution.md S03:
// Expected: All transitive runtime deps are declared in the package
// and pulled by APT. First launch succeeds without manual `apt
// install` of any extra package.
//
// Code anchor: scripts/packaging/deb.sh:185-197 — the DEBIAN/control
// file emits Package/Version/Section/Priority/Architecture/Maintainer/
// Description fields and **no `Depends:` line**, with the inline
// comment at :181-183 ("No external dependencies are required at
// runtime"). The case-doc treats this as a regression: Critical
// surface, expected contract is "deps declared", current state is
// "deps absent". So this runner is a regression detector — on a
// deb-installed host today it will FAIL until upstream emits a
// Depends line. Don't invert the assertion to make it green; failing
// is the signal.
//
// Layer: pure spawn probe. `dpkg-query -W -f='${Depends}'
// claude-desktop` reads the field straight out of dpkg's status db,
// so we don't need to know where the .deb lives in apt's cache or
// how the package was originally fetched.
//
// Skip behaviour: if dpkg-query exits non-zero (no dpkg installed,
// or claude-desktop not in dpkg's db), the package isn't deb-managed
// on this host and S03 has nothing to assert against.
//
// Subtlety on mixed-tooling hosts: a Fedora/RPM box that also has
// `dpkg` installed for cross-distro dev can wind up with a stale
// `claude-desktop` entry in dpkg's status db (matching the field
// shape from a previous deb install). dpkg-query exits 0 in that
// case and we still run the assertion — the field shape we read is
// authoritative for what a current deb install would look like, so
// it's a valid signal even if the binary on PATH is the rpm one.
test('S03 — DEB control file declares runtime dependencies', async (
{},
testInfo,
) => {
testInfo.annotations.push({
type: 'severity',
description: 'Critical',
});
testInfo.annotations.push({
type: 'surface',
description: 'Distribution / DEB packaging',
});
await testInfo.attach('session-env', {
body: JSON.stringify(captureSessionEnv(), null, 2),
contentType: 'application/json',
});
// Read the Depends field from dpkg's status db. If dpkg-query
// itself isn't installed (ENOENT) or the package isn't in the db
// (exit 1), skip — S03 only applies to deb-managed installs.
let dependsField: string;
let pkgVersion = '';
try {
const { stdout } = await exec(
'dpkg-query',
['-W', '-f=${Depends}', 'claude-desktop'],
{ timeout: 5_000 },
);
dependsField = stdout.trim();
} catch (err) {
const e = err as { stderr?: string; code?: number | string };
await testInfo.attach('dpkg-query-error', {
body: JSON.stringify(
{
code: e.code ?? null,
stderr: (e.stderr ?? '').trim(),
},
null,
2,
),
contentType: 'application/json',
});
test.skip(
true,
'S03 only applies to deb-installed claude-desktop ' +
'(dpkg-query missing or package not in dpkg db)',
);
return;
}
// Capture the full Depends payload, version, and resolved binary
// path as evidence regardless of pass/fail. Per Decision 7 these
// are always-on attachments.
try {
const { stdout } = await exec(
'dpkg-query',
['-W', '-f=${Version}', 'claude-desktop'],
{ timeout: 5_000 },
);
pkgVersion = stdout.trim();
} catch {
// Version probe is best-effort — Depends-field result above
// already proves the package is in the db.
}
let installPath = '';
try {
const { stdout } = await exec('which', ['claude-desktop'], {
timeout: 5_000,
});
installPath = stdout.trim();
} catch {
// `which` fails when the launcher isn't on PATH (e.g. dpkg
// has a stale record but the binary's been removed). Capture
// the empty string and let the Depends assertion run.
}
await testInfo.attach('depends-field', {
body: dependsField,
contentType: 'text/plain',
});
await testInfo.attach('package-version', {
body: pkgVersion,
contentType: 'text/plain',
});
await testInfo.attach('install-path', {
body: installPath,
contentType: 'text/plain',
});
await testInfo.attach('evidence', {
body: JSON.stringify(
{
dependsField,
dependsLength: dependsField.length,
packageVersion: pkgVersion,
installPath,
},
null,
2,
),
contentType: 'application/json',
});
// Core S03 assertion. Upstream contract: a Critical-severity
// runtime install pulls all transitive deps via APT, which
// requires the control file to declare them. Empty Depends ==
// regression against scripts/packaging/deb.sh:185-197.
expect(
dependsField,
'DEBIAN/control Depends: field is non-empty per upstream ' +
'contract (case-doc S03 — currently fails until ' +
'scripts/packaging/deb.sh:185-197 emits a Depends line)',
).not.toBe('');
});

View File

@@ -0,0 +1,228 @@
import { test, expect } from '@playwright/test';
import { execFile } from 'node:child_process';
import { promisify } from 'node:util';
const exec = promisify(execFile);
// S04 — RPM install via DNF pulls all required runtime deps.
//
// Mirror of S03 for the RPM/DNF branch. Case-doc:
// docs/testing/cases/distribution.md#s04--rpm-install-via-dnf-pulls-all-required-runtime-deps
//
// Severity: Critical. Surface: DNF repository / dependency
// declarations. Applies to KDE-W, KDE-X, GNOME, Sway, i3, Niri (any
// RPM-based distro).
//
// Case-doc anchors `scripts/packaging/rpm.sh:188` (`AutoReqProv: no`
// disables RPM's auto-dep generation; the spec declares no
// `Requires:`) and `:194-198` (strip + build-id disabled because
// Electron binaries don't tolerate them — bundled approach).
//
// **Regression-detector shape.** The assertion direction is "Requires
// has at least one declared runtime dep" — i.e. at least one line in
// `rpm -qR claude-desktop` that isn't an `rpmlib(...)` capability and
// isn't a `%post`/`%postun` interpreter path (`/bin/sh` etc). Today
// that filter empties out and the test FAILS, which is the documented
// state per the case-doc until upstream `rpm.sh` flips
// `AutoReqProv: on` (or declares an explicit `Requires:` block).
//
// `rpm -qR` always emits `rpmlib(CompressedFileNames)`,
// `rpmlib(FileDigests)`, `rpmlib(PayloadFilesHavePrefix)`, and
// `rpmlib(PayloadIsZstd)` regardless of spec content — those are
// satisfied by the rpm runtime itself, not by declared deps. Bare
// interpreter paths like `/bin/sh` come from scriptlet detection on
// the spec's `%post` / `%postun`, not from declared library deps.
// Both get filtered out so the assertion is strictly "did anyone
// declare a runtime dep, by hand or via AutoReqProv".
//
// Skip cleanly when:
// - `rpm` isn't on PATH (Debian/Ubuntu host, AppImage-only host).
// - `rpm -q claude-desktop` says the package isn't rpm-installed
// (deb host with rpm tooling for cross-distro dev, AppImage extract).
//
// Layer: spawn probe + stdout parse. No app launch. Row-independent
// in shape, but only meaningful on RPM-based rows.
interface ProbeResult {
cmd: string;
exitCode: number | null;
stdout: string;
stderr: string;
}
async function probe(
bin: string,
args: string[],
): Promise<ProbeResult> {
const cmd = `${bin} ${args.join(' ')}`;
try {
const { stdout, stderr } = await exec(bin, args, {
timeout: 5_000,
});
return {
cmd,
exitCode: 0,
stdout: stdout.trim(),
stderr: stderr.trim(),
};
} catch (err) {
const e = err as {
stdout?: string;
stderr?: string;
code?: number | string;
};
const code =
typeof e.code === 'number'
? e.code
: typeof e.code === 'string'
? null
: null;
return {
cmd,
exitCode: code,
stdout: (e.stdout ?? '').trim(),
stderr: (e.stderr ?? '').trim(),
};
}
}
function formatProbe(p: ProbeResult): string {
const tail = [
p.stdout && `stdout: ${p.stdout}`,
p.stderr && `stderr: ${p.stderr}`,
]
.filter(Boolean)
.join('\n');
return `$ ${p.cmd} (exit ${p.exitCode ?? '?'})\n${tail}`.trim();
}
// `rpm -qR` lines we don't count as "declared runtime deps":
// - `rpmlib(...)` capabilities — auto-emitted by rpm regardless of
// the spec, satisfied by the rpm runtime itself.
// - Bare interpreter paths (`/bin/sh`, `/bin/bash`, `/usr/bin/env`)
// — picked up from the spec's scriptlets (`%post` / `%postun`),
// not from declared library deps.
function isAutoEmittedRequire(line: string): boolean {
const trimmed = line.trim();
if (!trimmed) return true;
if (trimmed.startsWith('rpmlib(')) return true;
// Strip a trailing version constraint ("/bin/sh >= 1.0") before
// matching so the shape is just the capability/path.
const head = trimmed.split(/\s+/)[0] ?? '';
if (
head === '/bin/sh' ||
head === '/bin/bash' ||
head === '/usr/bin/env' ||
head === '/usr/bin/sh' ||
head === '/usr/bin/bash'
) {
return true;
}
return false;
}
test('S04 — RPM package declares runtime requirements', async (
{},
testInfo,
) => {
testInfo.annotations.push({
type: 'severity',
description: 'Critical',
});
testInfo.annotations.push({
type: 'surface',
description: 'DNF repository / dependency declarations',
});
// Skip cleanly on hosts without rpm tooling.
const rpmWhich = await probe('which', ['rpm']);
await testInfo.attach('which-rpm', {
body: formatProbe(rpmWhich),
contentType: 'text/plain',
});
if (rpmWhich.exitCode !== 0 || !rpmWhich.stdout) {
test.skip(
true,
'S04 only applies to rpm-installed claude-desktop ' +
'(rpm not on PATH)',
);
return;
}
// Resolve installed package version. `rpm -q` returns non-zero if
// the package isn't installed via rpm (Debian/AppImage host with
// rpm tooling, etc) — that's the second skip path.
const rpmQ = await probe('rpm', ['-q', 'claude-desktop']);
await testInfo.attach('rpm-q', {
body: formatProbe(rpmQ),
contentType: 'text/plain',
});
if (rpmQ.exitCode !== 0) {
test.skip(
true,
'S04 only applies to rpm-installed claude-desktop ' +
'(rpm -q claude-desktop returned non-zero)',
);
return;
}
// Capture install path for the diagnostics bundle. Failure here
// isn't a skip — `which` not finding `claude-desktop` on a host
// where `rpm -q claude-desktop` succeeds is unusual but harmless
// for the assertion shape.
const whichClaude = await probe('which', ['claude-desktop']);
await testInfo.attach('which-claude-desktop', {
body: formatProbe(whichClaude),
contentType: 'text/plain',
});
const rpmRequires = await probe('rpm', ['-qR', 'claude-desktop']);
await testInfo.attach('rpm-qR', {
body: formatProbe(rpmRequires),
contentType: 'text/plain',
});
expect(
rpmRequires.exitCode,
`rpm -qR claude-desktop must succeed on an rpm-installed host`,
).toBe(0);
const allLines = rpmRequires.stdout
.split('\n')
.map((l) => l.trim())
.filter((l) => l.length > 0);
const declaredRequires = allLines.filter(
(l) => !isAutoEmittedRequire(l),
);
await testInfo.attach('requires-classified', {
body: JSON.stringify(
{
all: allLines,
declared: declaredRequires,
declaredCount: declaredRequires.length,
},
null,
2,
),
contentType: 'application/json',
});
// Core S04 assertion. Per case-doc "Expected": "All transitive
// runtime deps are declared in the RPM and pulled by DNF." A
// non-empty `declaredRequires` is the minimum signal — it doesn't
// prove the *full* set is declared, but it proves the spec moved
// off `AutoReqProv: no` with no manual `Requires:` (the current
// state per scripts/packaging/rpm.sh:188).
//
// Today this fails by design — the failure IS the regression-
// detector state. The assertion flips green once
// scripts/packaging/rpm.sh starts declaring runtime deps (manual
// Requires lines, AutoReqProv flip, or both).
expect(
declaredRequires.length,
`rpm -qR claude-desktop should report at least one declared ` +
`runtime requirement (non-rpmlib(...), non-interpreter). ` +
`Currently empty per scripts/packaging/rpm.sh:188 ` +
`(\`AutoReqProv: no\`, no \`Requires:\`).`,
).toBeGreaterThan(0);
});

View File

@@ -0,0 +1,201 @@
import { test, expect } from '@playwright/test';
import { execFile } from 'node:child_process';
import { promisify } from 'node:util';
import {
runDoctor,
captureSessionEnv,
} from '../lib/diagnostics.js';
const exec = promisify(execFile);
// S05 — Doctor recognises rpm-installed claude-desktop, doesn't
// false-flag as AppImage.
//
// Per docs/testing/cases/distribution.md S05 (sibling of T13 in
// launch.md — same surface, intentional matrix overlap):
//
// * Steps: on a Fedora/Nobara/RPM-based distro with claude-desktop
// installed via dnf, run `claude-desktop --doctor` and look for the
// install-method line.
// * Expected: doctor detects rpm install (e.g. via `rpm -qf` against
// the binary path) and reports it cleanly. No `not found via dpkg
// (AppImage?)` warning.
// * Currently: scripts/doctor.sh's install-method probe is gated on
// `command -v dpkg-query` and has no `rpm -qf` branch. Case-doc
// anchors the block as :290-299; the actual lines in the file as of
// runner-write time are :353-362 (drift noted, see report). On
// RPM-only hosts (no dpkg-query) the entire block is skipped — no
// install-method line is printed at all. On hosts with both
// dpkg-query installed AND an rpm-installed claude-desktop, the
// `_warn 'claude-desktop not found via dpkg (AppImage?)'` branch
// fires only if dpkg-query comes up empty. (Anecdotally on some
// Fedora hosts dpkg-query returns a stale Version string against
// `claude-desktop` — in that case the PASS path runs and the
// warning is suppressed for the wrong reason, but S05 still
// passes by the letter of the assertion.)
//
// Scope split vs T13:
//
// * T13 (launch.md) covers all rows: detect rpm OR deb, assert no
// false-flag for whichever owns the binary. Skips on AppImage /
// hand-built / undetectable installs.
// * S05 (this file) is RPM-only: skips when `rpm -qf` doesn't claim
// the binary, regardless of whether dpkg owns it. The matrix wants
// both cells filled; the overlap is intentional — S05 fails loudly
// on Fedora rows when T13's broader gating happens to skip (e.g.
// if `rpm -qf` is missing from PATH, T13 falls through to the
// `unknown` branch and skips, while S05 reports skip with the same
// reason but separately).
//
// Layer: spawn probe + stdout grep. Doesn't touch the running app
// instance; doctor is `--doctor`-gated and exits without launching
// Electron.
//
// Diagnostics on failure (per case-doc): full --doctor output,
// `rpm -qf $(which claude-desktop)`, the doctor source line that
// decides the format. Captured unconditionally as attachments so
// post-hoc triage from a JUnit-only run is possible.
const FALSE_FLAG_FRAGMENT = 'not found via dpkg (AppImage?)';
interface ProbeResult {
cmd: string;
exitCode: number | null;
stdout: string;
stderr: string;
}
async function probe(
bin: string,
args: string[],
): Promise<ProbeResult> {
const cmd = `${bin} ${args.join(' ')}`;
try {
const { stdout, stderr } = await exec(bin, args, {
timeout: 5_000,
});
return {
cmd,
exitCode: 0,
stdout: stdout.trim(),
stderr: stderr.trim(),
};
} catch (err) {
const e = err as {
stdout?: string;
stderr?: string;
code?: number;
};
return {
cmd,
exitCode: typeof e.code === 'number' ? e.code : null,
stdout: (e.stdout ?? '').trim(),
stderr: (e.stderr ?? '').trim(),
};
}
}
function formatProbe(p: ProbeResult): string {
const tail = [
p.stdout && `stdout: ${p.stdout}`,
p.stderr && `stderr: ${p.stderr}`,
]
.filter(Boolean)
.join('\n');
return `$ ${p.cmd} (exit ${p.exitCode ?? '?'})\n${tail}`.trim();
}
test('S05 — Doctor recognises rpm install, no dpkg false-flag', async (
{},
testInfo,
) => {
testInfo.annotations.push({
type: 'severity',
description: 'Should',
});
testInfo.annotations.push({
type: 'surface',
description: 'CLI / --doctor',
});
// Applies to RPM-based rows per case-doc (KDE-W, KDE-X, GNOME,
// Sway, i3, Niri). Rather than gating on the ROW env var, gate on
// the actual install method — the assertion has no signal on
// non-rpm hosts regardless of how the matrix labels them.
await testInfo.attach('session-env', {
body: JSON.stringify(captureSessionEnv(), null, 2),
contentType: 'application/json',
});
const launcher =
process.env.CLAUDE_DESKTOP_LAUNCHER ?? 'claude-desktop';
const whichProbe = await probe('which', [launcher]);
await testInfo.attach('which-claude-desktop', {
body: formatProbe(whichProbe),
contentType: 'text/plain',
});
const installPath =
whichProbe.stdout.split('\n')[0]?.trim() ?? '';
if (whichProbe.exitCode !== 0 || !installPath) {
test.skip(
true,
`claude-desktop not reachable on PATH ` +
`(launcher='${launcher}'); rpm-install probe needs ` +
`a resolvable binary`,
);
return;
}
// Detect rpm install. `rpm -qf` returns 0 + the owning package's
// NEVRA when the file is rpm-managed, non-zero otherwise. We also
// run `rpm -q claude-desktop` to surface the package metadata
// independent of which file `which` resolved (helpful when the
// launcher is a wrapper script that shadows the real binary).
const rpmFile = await probe('rpm', ['-qf', installPath]);
const rpmPkg = await probe('rpm', ['-q', 'claude-desktop']);
await testInfo.attach('rpm-qf', {
body: formatProbe(rpmFile),
contentType: 'text/plain',
});
await testInfo.attach('rpm-q-claude-desktop', {
body: formatProbe(rpmPkg),
contentType: 'text/plain',
});
if (rpmFile.exitCode !== 0) {
// Not rpm-installed. S05's assertion only has signal on RPM
// rows; on deb / AppImage / hand-built / undetectable installs
// this is a clean skip (T13 covers the deb-side mirror).
test.skip(
true,
`S05 only applies to rpm-installed claude-desktop; ` +
`rpm -qf ${installPath} returned ` +
`exit ${rpmFile.exitCode ?? '?'} ` +
`(stderr: ${rpmFile.stderr || '<empty>'})`,
);
return;
}
const result = await runDoctor(launcher);
await testInfo.attach('doctor-output', {
body: result.output,
contentType: 'text/plain',
});
await testInfo.attach('doctor-exit-code', {
body: String(result.exitCode),
contentType: 'text/plain',
});
// Core S05 assertion: doctor must NOT print the dpkg false-flag
// warning for an rpm-installed copy. T02 already asserts the
// exit-code contract (`doctor exits 0`) — don't duplicate that
// here; S05 is purely about the install-method line.
expect(
result.output,
`doctor must not false-flag rpm install ` +
`(${rpmFile.stdout || 'rpm-owned'} at ${installPath}) ` +
`as missing-dpkg AppImage`,
).not.toContain(FALSE_FLAG_FRAGMENT);
});

View File

@@ -0,0 +1,167 @@
import { test, expect } from '@playwright/test';
import { launchClaude } from '../lib/electron.js';
import { skipUnlessRow } from '../lib/row.js';
import { readPidArgv, argvHasFlag } from '../lib/argv.js';
import { readLauncherLog, captureSessionEnv } from '../lib/diagnostics.js';
import { retryUntil } from '../lib/retry.js';
// S07 — `CLAUDE_USE_WAYLAND=1` opt-in path works.
//
// Backs S07 in docs/testing/cases/shortcuts-and-input.md.
//
// Case-doc anchors:
// scripts/launcher-common.sh:28-29 — `CLAUDE_USE_WAYLAND=1` opt-in
// (sets `use_x11_on_wayland=false`, taking the native-Wayland
// branch in build_electron_args).
// scripts/launcher-common.sh:100-111 — native-Wayland Electron flags:
// `--enable-features=UseOzonePlatform,WaylandWindowDecorations`,
// `--ozone-platform=wayland`, `--enable-wayland-ime`,
// `--wayland-text-input-version=3`, plus `GDK_BACKEND=wayland`.
//
// What this asserts: when the harness's Wayland mode is engaged
// (`CLAUDE_HARNESS_USE_WAYLAND=1`), the spawned Electron's argv
// contains `--ozone-platform=wayland` and `CLAUDE_USE_WAYLAND=1` is
// exported into the spawn env. That mirrors the launcher's
// CLAUDE_USE_WAYLAND=1 branch — same flag set is emitted (see
// LAUNCHER_INJECTED_FLAGS_WAYLAND in src/lib/electron.ts:134-141).
//
// Gating choice — harness-mode vs launcher-script:
//
// The harness deliberately bypasses the launcher script (CDP-gate
// reasons — see lib/electron.ts:102-117), so it constructs its own
// flag set. Setting `extraEnv: { CLAUDE_USE_WAYLAND: '1' }` would
// only affect the child env, not the harness's flag selector. To
// exercise the Wayland branch end-to-end the harness exposes
// `CLAUDE_HARNESS_USE_WAYLAND=1`, which:
// 1. swaps to LAUNCHER_INJECTED_FLAGS_WAYLAND (the same flag
// set the launcher's Wayland branch emits), and
// 2. exports `CLAUDE_USE_WAYLAND=1` + `GDK_BACKEND=wayland` into
// the child env.
//
// This test asserts that contract. When CLAUDE_HARNESS_USE_WAYLAND
// is unset we skip — the harness's X11 default doesn't model the
// CLAUDE_USE_WAYLAND opt-in path. Run the suite with
// `CLAUDE_HARNESS_USE_WAYLAND=1 npx playwright test ...` to
// activate the assertion.
//
// Row gate: native-Wayland-capable rows only. KDE-W is intentionally
// included even though the case-doc Applies-to lists wlroots rows
// (Sway/Niri/Hypr) — KDE Plasma Wayland can also run native Wayland
// when CLAUDE_USE_WAYLAND=1 is set, and KDE-W is the harness's CI
// row, so we want this to be exercisable there.
test.setTimeout(45_000);
test('S07 — CLAUDE_USE_WAYLAND opt-in surfaces in Electron argv', async ({}, testInfo) => {
testInfo.annotations.push({ type: 'severity', description: 'Should' });
testInfo.annotations.push({
type: 'surface',
description: 'Display backend / Wayland opt-in',
});
skipUnlessRow(testInfo, [
'Sway',
'Niri',
'Hypr-O',
'Hypr-N',
'GNOME-W',
'KDE-W',
]);
if (process.env.CLAUDE_HARNESS_USE_WAYLAND !== '1') {
test.skip(
true,
'S07 requires CLAUDE_HARNESS_USE_WAYLAND=1 (the harness ' +
'Wayland-mode that mirrors the launcher CLAUDE_USE_WAYLAND ' +
'branch). Re-run with the env set.',
);
return;
}
await testInfo.attach('session-env', {
body: JSON.stringify(captureSessionEnv(), null, 2),
contentType: 'application/json',
});
await testInfo.attach('harness-env', {
body: JSON.stringify(
{
CLAUDE_HARNESS_USE_WAYLAND:
process.env.CLAUDE_HARNESS_USE_WAYLAND ?? null,
CLAUDE_USE_WAYLAND: process.env.CLAUDE_USE_WAYLAND ?? null,
},
null,
2,
),
contentType: 'application/json',
});
const useHostConfig = process.env.CLAUDE_TEST_USE_HOST_CONFIG === '1';
const app = await launchClaude({
isolation: useHostConfig ? null : undefined,
});
try {
// Don't waitForX11Window — under native Wayland the app is
// going through Ozone-Wayland directly, no XWayland window
// appears. /proc/$pid/cmdline is populated by exec(), so we
// just need the spawned Electron to stay alive long enough
// to read it. Poll for non-null + non-empty argv.
const argv = await retryUntil(
async () => {
const a = await readPidArgv(app.pid);
return a && a.length > 0 ? a : null;
},
{ timeout: 15_000, interval: 250 },
);
await testInfo.attach('electron-argv', {
body: JSON.stringify(argv, null, 2),
contentType: 'application/json',
});
expect(argv, 'could read /proc/$pid/cmdline').not.toBeNull();
// Launcher log is only populated when the launcher script
// runs; the harness spawns Electron directly. Capture the
// log if it happens to exist (host-leftover from an earlier
// real-launcher run) for diagnostic context only.
const log = await readLauncherLog();
if (log) {
const tail = log.split('\n').slice(-50).join('\n');
await testInfo.attach('launcher-log-tail', {
body: tail,
contentType: 'text/plain',
});
}
const ozoneWayland = argvHasFlag(argv ?? [], '--ozone-platform=wayland');
const useOzone = argvHasFlag(
argv ?? [],
'--enable-features=UseOzonePlatform',
);
await testInfo.attach('flag-presence', {
body: JSON.stringify(
{
'--ozone-platform=wayland': ozoneWayland,
'--enable-features=UseOzonePlatform': useOzone,
note:
'When CLAUDE_HARNESS_USE_WAYLAND=1 the harness ' +
'must emit the same Electron flag set as the ' +
'launcher script\'s CLAUDE_USE_WAYLAND=1 branch.',
},
null,
2,
),
contentType: 'application/json',
});
expect(
ozoneWayland,
'spawned Electron has --ozone-platform=wayland on argv',
).toBe(true);
expect(
useOzone,
'spawned Electron has --enable-features=UseOzonePlatform ' +
'(co-emitted with the wayland ozone flag)',
).toBe(true);
} finally {
await app.close();
}
});

View File

@@ -0,0 +1,129 @@
import { test, expect } from '@playwright/test';
import { readAsarFile, resolveAsarPath } from '../lib/asar.js';
import { skipUnlessRow } from '../lib/row.js';
// S08 — Tray rebuild-race fast-path injected (file probe).
//
// Backs the static side of S08 in
// docs/testing/cases/tray-and-window-chrome.md. T03 already covers the
// runtime SNI-count assertion (post-`nativeTheme.themeSource` toggle:
// exactly one StatusNotifierItem stays registered). This spec is the
// complementary build-time fingerprint — verifies that
// `patch_tray_inplace_update` in scripts/patches/tray.sh actually
// landed in the bundled `index.js`, so a silent regex miss in the
// patch script (idempotency guard short-circuits, anchor regex drifts
// against minifier churn, etc.) is observable without having to wait
// for a runtime tray-duplication failure on KDE.
//
// Fingerprint: literal `.setImage(` substring in
// `.vite/build/index.js`.
//
// Why this is load-bearing and stable:
//
// - Pristine upstream (`build-reference/app-extracted/.vite/build/
// index.js`) contains zero `.setImage(` occurrences. The tray
// constructs exclusively via `new <EL>.Tray(<EL>.nativeImage
// .createFromPath(...))` and never re-images in place. (Verified
// by `grep -cE '\.setImage\s*\(' index.js` → 0.)
// - The injected fast-path emitted by `patch_tray_inplace_update`
// (scripts/patches/tray.sh:212-217) calls
// `<TRAY_VAR>.setImage(<EL_VAR>.nativeImage.createFromPath(
// <PATH_VAR>))` — that is the entire point of the fast-path
// (skip destroy + recreate, update the existing Tray's image in
// place so the SNI registration stays put on KDE Plasma).
// - The Electron API name `setImage` is not a minified local —
// it's a method on `Tray.prototype` and stays literal across
// upstream version bumps regardless of the bundler's variable
// renaming. So the fingerprint is robust to the same minifier
// churn that forces tray.sh to extract `tray_var` / `electron_var`
// / `path_var` dynamically.
// - Idempotency marker in tray.sh:174-180 keys on the same literal
// post-rename `setImage(<EL>.nativeImage.createFromPath(<PATH>))`
// sequence; presence of `.setImage(` therefore tracks 1:1 with
// the patch's own self-detection.
//
// Why not the other candidates considered:
//
// - `_trayStartTime`: already covered by H03 for the prior tray.sh
// sub-patch (`patch_tray_menu_handler`). H03's note explicitly
// calls out that the in-place update sub-patch needs its own
// fingerprint, which is what S08 supplies here.
// - `process.platform!=="darwin"`: appears 50+ times in the
// minified bundle (every Electron-on-Linux / -on-Windows
// branch). Not distinctive.
// - `setContextMenu` count >= 2: works (upstream has exactly one
// occurrence; patched bundle has two — fast-path + slow-path),
// but is brittle to any future upstream code that calls
// `setContextMenu` for an unrelated reason. `.setImage(`
// presence-only is stricter and simpler.
//
// Pure file probe — no app launch. Fast (<1s). Row-gated to KDE
// (case-doc Applies-to: KDE-W, KDE-X) since the underlying SNI
// rebuild race only manifests on KDE Plasma's `systemtray` widget;
// other DEs handle UnregisterItem/Register sequencing without the
// duplicate-icon visual artifact, so the fast-path is a should-have
// there but the assertion isn't load-bearing for the row.
test('S08 — Tray rebuild-race fast-path injected (file probe)', async ({}, testInfo) => {
skipUnlessRow(testInfo, ['KDE-W', 'KDE-X']);
testInfo.annotations.push({ type: 'severity', description: 'Should' });
testInfo.annotations.push({
type: 'surface',
description: 'Tray icon / KDE rebuild race',
});
const asarPath = resolveAsarPath();
await testInfo.attach('asar-path', {
body: asarPath,
contentType: 'text/plain',
});
const indexJs = readAsarFile('.vite/build/index.js', asarPath);
// `.setImage(` is the patch-injected literal. Match-count is
// surfaced for diagnostics: 0 = patch missed, 1+ = patch landed.
// (We don't pin to exactly 1 — if upstream ever ships a
// legitimate second `.setImage(` site, the patch's fast-path is
// still present and S08 should still pass.)
const setImageCount = (indexJs.match(/\.setImage\s*\(/g) ?? []).length;
const fastPathPresent = setImageCount > 0;
// Bonus diagnostic signal: the slow-path destroy+recreate block
// is preserved by the patch (it stays in place for initial-
// creation and tray-disable cases — see tray.sh:182-188 and
// docs/learnings/tray-rebuild-race.md "The fix"). So a healthy
// patched bundle has >= 1 `setContextMenu` call (slow path) and
// >= 1 `.setImage(` call (fast path). Pristine upstream has
// exactly 1 `setContextMenu` and 0 `.setImage(`.
const setContextMenuCount = (
indexJs.match(/\.setContextMenu\s*\(/g) ?? []
).length;
await testInfo.attach('fingerprint-evidence', {
body: JSON.stringify(
{
file: '.vite/build/index.js',
fingerprint: '.setImage(',
setImageCount,
setContextMenuCount,
fastPathPresent,
source:
'patches/tray.sh:212-217 (patch_tray_inplace_update) ' +
'injects `<TRAY>.setImage(<EL>.nativeImage.' +
'createFromPath(<PATH>))` before the destroy+recreate ' +
'block. Upstream never calls .setImage on the tray, ' +
'so non-zero count == patch landed.',
},
null,
2,
),
contentType: 'application/json',
});
expect(
fastPathPresent,
'app.asar contains the in-place `.setImage(` call injected by ' +
'patch_tray_inplace_update (scripts/patches/tray.sh)',
).toBe(true);
});

View File

@@ -0,0 +1,47 @@
import { test, expect } from '@playwright/test';
import { readAsarFile, resolveAsarPath } from '../lib/asar.js';
// S09 — Quick window patch runs only on KDE (post-#406 gate).
// Backs QE-19 in docs/testing/quick-entry-closeout.md.
//
// The patch in scripts/patches/quick-window.sh injects an
// `(process.env.XDG_CURRENT_DESKTOP||"").toLowerCase().includes("kde")`
// gate into the bundled JS. The string `XDG_CURRENT_DESKTOP` shows up
// in app.asar's index.js if and only if the patch ran at build time.
// The patch ships in every build; the KDE-vs-non-KDE branch is
// decided at runtime by the env-var check.
//
// Pure file probe — no app launch. Fast (<1s).
//
// Runtime gate effectiveness is verified implicitly by S31 passing
// on KDE (popup-show works through the patched code path) and the
// upstream-equivalent path running on non-KDE rows.
test('S09 — Quick window patch runs only on KDE (post-#406 gate)', async ({}, testInfo) => {
testInfo.annotations.push({ type: 'severity', description: 'Critical' });
testInfo.annotations.push({ type: 'surface', description: 'Patch gate' });
const asarPath = resolveAsarPath();
await testInfo.attach('asar-path', {
body: asarPath,
contentType: 'text/plain',
});
const indexJs = readAsarFile('.vite/build/index.js', asarPath);
// The gate string is the runtime fingerprint of the patch. If the
// patch didn't run, the bundled JS won't contain it.
const gatePresent = indexJs.includes('XDG_CURRENT_DESKTOP');
expect(
gatePresent,
'app.asar contains the XDG_CURRENT_DESKTOP gate string injected by quick-window.sh',
).toBe(true);
// Bonus signal: the patch's idempotency guard. If both are
// present the patch's full payload landed.
const patchedComment = indexJs.includes('kde');
await testInfo.attach('gate-evidence', {
body: JSON.stringify({ gatePresent, patchedComment }, null, 2),
contentType: 'application/json',
});
});

View File

@@ -0,0 +1,122 @@
import { test, expect } from '@playwright/test';
import { launchClaude } from '../lib/electron.js';
import { skipUnlessRow } from '../lib/row.js';
import { QuickEntry } from '../lib/quickentry.js';
import { captureSessionEnv } from '../lib/diagnostics.js';
// S10 — Quick Entry popup is transparent (no opaque square frame).
// Backs the KDE-W row of S10 in
// docs/testing/cases/shortcuts-and-input.md.
//
// Upstream constructs the popup BrowserWindow with
// transparent: true, backgroundColor: "#00000000", frame: false
// at build-reference index.js:515380, 515383, 515381. On KDE Plasma
// Wayland the compositor honours the alpha channel and the popup
// renders with a transparent background; on broken-Electron versions
// (electron/electron#50213, the 41.0.4-41.x.y bisect window per
// @noctuum on #370) the alpha is dropped and an opaque square frame
// shows behind the rounded prompt UI.
//
// Construction-time options aren't observable through the prototype-
// method hook in lib/quickentry.ts (the Proxy from frame-fix-wrapper
// returns the closure-captured PatchedBrowserWindow on `electron.
// BrowserWindow` reads — see the doc-comment on
// QuickEntry.installInterceptor and CLAUDE.md "Test harness Electron
// hooks" learning). Runtime-side, `getBackgroundColor()` reflects
// what the BrowserWindow was actually constructed with — so we read
// it via getPopupRuntimeProps() and assert
// transparent === true && backgroundColor in {'#00000000','#0000'}
// matching the predicate in lib/quickentry.ts:266.
//
// Gated to KDE-W: other KDE rows (KDE-X) don't have the same
// compositor / Electron-Wayland concern that the case-doc S10
// surfaces. If S10 fails on a host whose bundled Electron is in the
// 41.0.4-41.x.y window, that's the upstream regression — see S33 for
// the version-capture half. Don't wrap in skip on failure; surface
// it as a regression-detector signal.
test.setTimeout(60_000);
test('S10 — Quick Entry popup is transparent (no opaque square frame)', async ({}, testInfo) => {
testInfo.annotations.push({ type: 'severity', description: 'Should' });
testInfo.annotations.push({
type: 'surface',
description: 'Quick Entry window (KDE Wayland)',
});
skipUnlessRow(testInfo, ['KDE-W']);
await testInfo.attach('session-env', {
body: JSON.stringify(captureSessionEnv(), null, 2),
contentType: 'application/json',
});
const useHostConfig = process.env.CLAUDE_TEST_USE_HOST_CONFIG === '1';
const app = await launchClaude({
isolation: useHostConfig ? null : undefined,
});
await testInfo.attach('isolation', {
body: JSON.stringify(
{
useHostConfig,
configDir: app.isolation?.configDir ?? null,
},
null,
2,
),
contentType: 'application/json',
});
try {
// Main needs to be up before the shortcut can lazily construct
// the popup — the popup-show path reads renderer state via
// upstream's lHn() user-loaded check (see openAndWaitReady's
// retry-loop comment in lib/quickentry.ts).
const { inspector } = await app.waitForReady('mainVisible');
const qe = new QuickEntry(inspector);
await qe.installInterceptor();
// Fire the OS shortcut and wait for the popup BrowserWindow to
// be visible with its textarea mounted — same handshake S29
// uses. If ydotool isn't reachable, openAndWaitReady throws
// the install-instructions error from ensureYdotool — that
// surfaces as a clear test failure (acceptable per the
// case-doc; not wrapped in a skip).
await qe.openAndWaitReady();
const props = await qe.getPopupRuntimeProps();
await testInfo.attach('popup-runtime-props', {
body: JSON.stringify(props, null, 2),
contentType: 'application/json',
});
expect(
props,
'getPopupRuntimeProps returned null — interceptor did not ' +
'capture the popup BrowserWindow ref',
).not.toBeNull();
// Predicate matches lib/quickentry.ts:266 — '#00000000' is the
// canonical 8-digit form Electron returns for the upstream
// construction value, '#0000' is the short form some Electron
// builds normalise to. Either is acceptable.
expect(
props!.backgroundColor === '#00000000'
|| props!.backgroundColor === '#0000',
`popup backgroundColor must be transparent (#00000000 or ` +
`#0000), got ${JSON.stringify(props!.backgroundColor)}. ` +
`If the bundled Electron is in the 41.0.4-41.x.y window ` +
`(see S33), this is the electron#50213 regression ` +
`tracked under issue #370.`,
).toBe(true);
expect(
props!.transparent,
'popup transparent flag (derived from backgroundColor) is ' +
'false — opaque square frame would render behind the ' +
'rounded prompt UI',
).toBe(true);
inspector.close();
} finally {
await app.close();
}
});

View File

@@ -0,0 +1,262 @@
import { test, expect } from '@playwright/test';
import { launchClaude } from '../lib/electron.js';
import { skipUnlessRow } from '../lib/row.js';
import { QuickEntry } from '../lib/quickentry.js';
import {
focusOtherWindow,
getFocusedWindowId,
spawnMarkerWindow,
WaylandFocusUnavailable,
XdotoolUnavailable,
type MarkerWindow,
} from '../lib/input.js';
import { captureSessionEnv, readLauncherLog } from '../lib/diagnostics.js';
import { sleep } from '../lib/retry.js';
// S11 — Quick Entry shortcut fires from any focus on Wayland
// (mutter XWayland key-grab). Backs the S11 row in
// docs/testing/cases/shortcuts-and-input.md (severity: Critical).
//
// What this catches vs what it doesn't
// ------------------------------------
// The case-doc's load-bearing concern is the GNOME-W mutter
// XWayland key-grab regression — issue #404 — where mutter under
// native Wayland refuses to honour the XWayland-side global key
// grab, so the shortcut becomes focus-bound. This spec CANNOT
// detect that regression: there is no portable focus-injection
// path on native Wayland (each compositor exposes its own IPC
// and the libei input-emulation portal isn't universally
// honored). The lib/input.ts focus-shifter primitive throws
// `WaylandFocusUnavailable` on native Wayland rows by design —
// see its leading comment for the full reasoning. The Wayland-
// side regression detector is a primitive-gap; it stays manual
// until libei adoption broadens.
//
// What this spec DOES catch is a regression in the X11-side of
// the global-shortcut path (the side that currently works on
// GNOME-X / Ubu-X — `🔧` and `✅` respectively in the matrix).
// If the X11 grab broke on those rows, S11 would catch it. So
// this is a regression detector on a CURRENTLY-PASSING path,
// unlike S12 which is a currently-failing detector for the
// `--enable-features=GlobalShortcutsPortal` wiring.
//
// Row gate
// --------
// Case-doc applies-to is "GNOME, Ubu" (both W and X variants),
// but the focus-shifter primitive is X11-only, gated strictly on
// `XDG_SESSION_TYPE === 'x11'`. Wayland rows can't be exercised
// here — they would either skip via the row gate or trip
// `WaylandFocusUnavailable` from the primitive. So the runner's
// row gate is the X11 subset only: GNOME-X, Ubu-X. The Wayland
// rows for S11 stay manual / matrix-cell-from-doc until a
// libei-based primitive lands.
test.setTimeout(60_000);
test('S11 — Quick Entry shortcut fires from any focus (X11 path)', async ({}, testInfo) => {
skipUnlessRow(testInfo, ['GNOME-X', 'Ubu-X']);
testInfo.annotations.push({ type: 'severity', description: 'Critical' });
testInfo.annotations.push({
type: 'surface',
description: 'Quick Entry / global shortcut',
});
// Single-shot diagnostic record. We attach this once at the
// end (or on early throw) rather than spreading five separate
// attachments — mirrors S31's results shape so matrix-regen
// has one well-known JSON to scrape per spec.
const diag: {
sessionEnv: Record<string, string>;
markerTitle: string | null;
activeWidBeforeFocus: string | null;
activeWidAfterFocus: string | null;
popupState: unknown;
openError: string | null;
focusError: string | null;
launcherLogTail: string | null;
} = {
sessionEnv: captureSessionEnv(),
markerTitle: null,
activeWidBeforeFocus: null,
activeWidAfterFocus: null,
popupState: null,
openError: null,
focusError: null,
launcherLogTail: null,
};
const attachDiag = async () => {
await testInfo.attach('s11-diagnostics', {
body: JSON.stringify(diag, null, 2),
contentType: 'application/json',
});
};
const useHostConfig = process.env.CLAUDE_TEST_USE_HOST_CONFIG === '1';
const app = await launchClaude({
isolation: useHostConfig ? null : undefined,
});
let marker: MarkerWindow | null = null;
try {
// `mainVisible` is the cheapest level that gives us a
// registered global shortcut. Upstream registers via
// globalShortcut.register early in main-process startup
// (build-reference index.js:499416), but we still want
// the main window mapped so the popup-construction path
// has something to anchor to.
const { inspector } = await app.waitForReady('mainVisible');
const qe = new QuickEntry(inspector);
await qe.installInterceptor();
// Capture pre-focus active WID for the diagnostic record.
// On a healthy X11 session this is the Claude main window
// (we just `mainVisible`-readied it). If null, xprop is
// missing or _NET_ACTIVE_WINDOW is unset — neither is a
// blocker for the test, just less useful diagnostics.
diag.activeWidBeforeFocus = await getFocusedWindowId();
// Marker title is unique-per-test to avoid colliding with
// any leftover xterm from a previous run (xterm exits its
// `sleep 600` after 10min so leaks are bounded, but a
// re-run inside that window would otherwise match the
// stale window).
const markerTitle =
`claude-test-s11-marker-${testInfo.testId}-${Date.now()}`;
diag.markerTitle = markerTitle;
try {
marker = await spawnMarkerWindow(markerTitle);
} catch (err) {
// Most likely cause: xterm not on PATH. The primitive
// throws a plain Error with the install hint. Skip
// rather than fail — this is an environment gap.
const msg = err instanceof Error ? err.message : String(err);
diag.focusError = `spawnMarkerWindow: ${msg}`;
await attachDiag();
testInfo.skip(
true,
'xterm not installed; required for the focus-shift target. ' +
`Underlying: ${msg}`,
);
return;
}
// `focusOtherWindow` calls `xdotool search --name <title>`
// once and throws if there are zero matches; only the
// post-focus _NET_ACTIVE_WINDOW verification has its own
// retry. So we need a brief readiness poll for the marker
// window to actually map into the X tree before we attempt
// the focus shift — and the focus shift itself must
// eventually succeed within the budget.
//
// We capture the LAST error (rather than rethrowing on the
// first) so the diagnostic carries the real cause if every
// attempt fails. WaylandFocusUnavailable / XdotoolUnavailable
// are sticky — they won't change between retries — so we
// short-circuit out on the first occurrence and skip.
let focusOk = false;
let lastFocusErr: unknown = null;
let earlySkipReason: string | null = null;
const focusBudgetMs = 5_000;
const focusStart = Date.now();
while (Date.now() - focusStart < focusBudgetMs) {
try {
await focusOtherWindow(markerTitle);
focusOk = true;
break;
} catch (err) {
lastFocusErr = err;
if (err instanceof WaylandFocusUnavailable) {
earlySkipReason =
'WaylandFocusUnavailable on a row that was ' +
'supposed to be X11-gated. Check XDG_SESSION_TYPE.';
break;
}
if (err instanceof XdotoolUnavailable) {
earlySkipReason =
'xdotool not installed; required for the ' +
'focus-shift step. ' +
(err instanceof Error ? err.message : String(err));
break;
}
// "no X11 window matches" (marker not mapped yet) or
// "compositor refused activation" — both can resolve on
// retry. Brief pause then loop.
await sleep(100);
}
}
if (earlySkipReason) {
diag.focusError =
lastFocusErr instanceof Error
? lastFocusErr.message
: String(lastFocusErr);
await attachDiag();
testInfo.skip(true, earlySkipReason);
return;
}
if (!focusOk) {
const msg =
lastFocusErr instanceof Error
? lastFocusErr.message
: String(lastFocusErr);
diag.focusError = msg;
diag.launcherLogTail = await readLauncherLog();
await attachDiag();
throw new Error(
`focusOtherWindow failed within ${focusBudgetMs}ms: ${msg}`,
);
}
// At this point focus is on the marker xterm. Capture the
// post-focus active WID — should equal the marker's WID,
// not Claude's. (We don't have a clean way to fetch the
// marker's WID independently here without re-running
// xdotool; the value-vs-pre comparison in the diagnostic
// is sufficient evidence of the shift.)
diag.activeWidAfterFocus = await getFocusedWindowId();
// Now press the global shortcut. The whole point of S11:
// even though the marker xterm holds focus (and Claude
// does not), the OS-level grab should fire the popup.
try {
await qe.openAndWaitReady();
} catch (err) {
diag.openError = err instanceof Error ? err.message : String(err);
diag.popupState = await qe.getPopupState();
diag.launcherLogTail = await readLauncherLog();
await attachDiag();
throw err;
}
const popupState = await qe.getPopupState();
diag.popupState = popupState;
diag.launcherLogTail = await readLauncherLog();
await attachDiag();
// Single critical assertion: popup exists AND is visible
// after the shortcut press from non-Claude focus. A null
// state means the BrowserWindow was never constructed —
// the X11 grab didn't fire. visible === false means it
// constructed but show() was suppressed (the upstream
// lHn() short-circuit, or a regression in the visibility
// flow). Either is a fail for S11's contract.
expect(
popupState && popupState.visible,
'Quick Entry popup is visible after shortcut press from ' +
'non-Claude focus (X11 path)',
).toBe(true);
} finally {
// Marker xterm cleanup is idempotent. Always run before
// app.close() so the kill happens even if the spec
// throws between the two.
if (marker) {
await marker.kill().catch(() => {
// best-effort — process may already be dead
});
}
await app.close();
}
});

View File

@@ -0,0 +1,95 @@
import { test, expect } from '@playwright/test';
import { launchClaude } from '../lib/electron.js';
import { skipUnlessRow } from '../lib/row.js';
import { readPidArgv, argvHasFlag } from '../lib/argv.js';
import { readLauncherLog, captureSessionEnv } from '../lib/diagnostics.js';
// S12 — `--enable-features=GlobalShortcutsPortal` launcher flag
// wired up for GNOME Wayland. Backs QE-6 in
// docs/testing/quick-entry-closeout.md.
//
// On GNOME Wayland, mutter no longer honors XWayland-side key grabs,
// so the Quick Entry global shortcut fails from unfocused state
// (#404). The fix is to route global shortcuts through XDG Desktop
// Portal: pass `--enable-features=GlobalShortcutsPortal` to Electron
// from the launcher when XDG_CURRENT_DESKTOP includes GNOME and
// XDG_SESSION_TYPE is wayland.
//
// As of writing, this fix is NOT implemented. The test asserts the
// fix's signature (the flag is in the spawned Electron's argv) and
// will therefore FAIL on GNOME-W until the launcher patch lands.
// That's intentional — it's the regression detector, not a smoke
// test. Once the patch is in, this becomes a Critical green cell.
//
// Row gate: GNOME Wayland only. KDE rows skip with `-`.
test.setTimeout(45_000);
test('S12 — --enable-features=GlobalShortcutsPortal launcher flag wired up for GNOME Wayland', async ({}, testInfo) => {
testInfo.annotations.push({ type: 'severity', description: 'Critical' });
testInfo.annotations.push({
type: 'surface',
description: 'Launcher flag wiring',
});
skipUnlessRow(testInfo, ['GNOME-W', 'Ubu-W']);
await testInfo.attach('session-env', {
body: JSON.stringify(captureSessionEnv(), null, 2),
contentType: 'application/json',
});
const useHostConfig = process.env.CLAUDE_TEST_USE_HOST_CONFIG === '1';
const app = await launchClaude({
isolation: useHostConfig ? null : undefined,
});
try {
await app.waitForX11Window(15_000);
const argv = await readPidArgv(app.pid);
await testInfo.attach('electron-argv', {
body: JSON.stringify(argv, null, 2),
contentType: 'application/json',
});
expect(argv, 'could read /proc/$pid/cmdline').not.toBeNull();
// Launcher log carries a stable line — see
// scripts/launcher-common.sh:98, 102 — that says which backend
// was selected. Capture it for diagnostic context.
const log = await readLauncherLog();
if (log) {
const tail = log.split('\n').slice(-50).join('\n');
await testInfo.attach('launcher-log-tail', {
body: tail,
contentType: 'text/plain',
});
}
const present = argvHasFlag(
argv ?? [],
'--enable-features=GlobalShortcutsPortal',
);
await testInfo.attach('flag-presence', {
body: JSON.stringify(
{
flag: '--enable-features=GlobalShortcutsPortal',
present,
note:
'On GNOME Wayland this flag must be present for ' +
'#404 to be closeable. Until the launcher patch ' +
'lands, this test fails as a regression detector.',
},
null,
2,
),
contentType: 'application/json',
});
expect(
present,
'--enable-features=GlobalShortcutsPortal is in Electron argv on GNOME Wayland',
).toBe(true);
} finally {
await app.close();
}
});

View File

@@ -0,0 +1,266 @@
import { test, expect } from '@playwright/test';
import { launchClaude } from '../lib/electron.js';
import { skipUnlessRow } from '../lib/row.js';
import { QuickEntry } from '../lib/quickentry.js';
import {
focusOtherWindow,
getFocusedWindowId,
spawnMarkerWindow,
NiriIpcUnavailable,
FootUnavailable,
type MarkerWindow,
} from '../lib/input-niri.js';
import { captureSessionEnv, readLauncherLog } from '../lib/diagnostics.js';
import { sleep } from '../lib/retry.js';
// S14 — Quick Entry shortcut fires from any focus on Niri
// (XDG portal BindShortcuts path). Backs the S14 row in
// docs/testing/cases/shortcuts-and-input.md (severity: Critical
// for Niri users).
//
// What this catches vs what it doesn't
// ------------------------------------
// On Niri the launcher special-cases the app to native Wayland
// (`scripts/launcher-common.sh:41-44`), so upstream's
// `globalShortcut.register` (`index.js:499416`) routes through
// Electron's `xdg-desktop-portal` `BindShortcuts` path inside
// Chromium rather than an X11 grab. The case-doc records this
// path as currently failing on Niri:
// `Failed to call BindShortcuts (error code 5)`. So this spec
// is a known-failing detector — the shape mirrors S12's
// `--enable-features=GlobalShortcutsPortal` GNOME-W detector:
// the assertion encodes the contract, and the test will start
// passing automatically once the upstream / portal-side issue
// is resolved on Niri without any spec edit.
//
// The user-visible symptom (Quick Entry shortcut doesn't fire
// on Niri) is the same as #404 (mutter XWayland key-grab on
// GNOME-W) but the root cause is different: Niri is wlroots
// Wayland with no XWayland by default, so the X11-side
// `lib/input.ts` focus-shifter cannot exercise this path.
// `lib/input-niri.ts` is the substrate — `niri msg --json`
// for the focus-injection + readback chain, `foot --title` for
// the Wayland-native marker window. The mutter / GNOME-W
// regression detector remains a separate primitive gap (libei
// when broadly available, or a per-compositor mutter-IPC
// primitive — neither shipped).
//
// Row gate
// --------
// Niri only. Other Wayland rows (KDE-W, GNOME-W, Ubu-W) each
// need their own compositor IPC and stay manual / matrix-cell-
// from-doc until a libei-based primitive lands.
test.setTimeout(60_000);
test('S14 — Quick Entry shortcut fires from any focus (Niri Wayland path)', async ({}, testInfo) => {
skipUnlessRow(testInfo, ['Niri']);
testInfo.annotations.push({ type: 'severity', description: 'Critical' });
testInfo.annotations.push({
type: 'surface',
description: 'XDG Desktop Portal BindShortcuts',
});
// Single-shot diagnostic record. We attach this once at the
// end (or on early throw) rather than spreading five separate
// attachments — mirrors S31's results shape so matrix-regen
// has one well-known JSON to scrape per spec.
const diag: {
sessionEnv: Record<string, string>;
markerTitle: string | null;
activeWidBeforeFocus: number | null;
activeWidAfterFocus: number | null;
popupState: unknown;
openError: string | null;
focusError: string | null;
launcherLogTail: string | null;
} = {
sessionEnv: captureSessionEnv(),
markerTitle: null,
activeWidBeforeFocus: null,
activeWidAfterFocus: null,
popupState: null,
openError: null,
focusError: null,
launcherLogTail: null,
};
const attachDiag = async () => {
await testInfo.attach('s14-diagnostics', {
body: JSON.stringify(diag, null, 2),
contentType: 'application/json',
});
};
const useHostConfig = process.env.CLAUDE_TEST_USE_HOST_CONFIG === '1';
const app = await launchClaude({
isolation: useHostConfig ? null : undefined,
});
let marker: MarkerWindow | null = null;
try {
// `mainVisible` is the cheapest level that gives us a
// registered global shortcut. Upstream registers via
// globalShortcut.register early in main-process startup
// (build-reference index.js:499416), but we still want
// the main window mapped so the popup-construction path
// has something to anchor to.
const { inspector } = await app.waitForReady('mainVisible');
const qe = new QuickEntry(inspector);
await qe.installInterceptor();
// Capture pre-focus active window id for the diagnostic
// record. On a healthy Niri session this is the Claude
// main window (we just `mainVisible`-readied it). If
// null, `niri msg` is unavailable or there is no focused
// window — neither blocks the test, just less useful
// diagnostics.
diag.activeWidBeforeFocus = await getFocusedWindowId();
// Marker title is unique-per-test to avoid colliding with
// any leftover foot from a previous run (foot exits its
// `sleep 600` after 10min so leaks are bounded, but a
// re-run inside that window would otherwise match the
// stale window).
const markerTitle =
`claude-test-s14-marker-${testInfo.testId}-${Date.now()}`;
diag.markerTitle = markerTitle;
try {
marker = await spawnMarkerWindow(markerTitle);
} catch (err) {
// Most likely cause: foot not on PATH. The primitive
// throws `FootUnavailable` with the install hint. Skip
// rather than fail — this is an environment gap.
const msg = err instanceof Error ? err.message : String(err);
diag.focusError = `spawnMarkerWindow: ${msg}`;
await attachDiag();
testInfo.skip(
true,
'foot not installed; required for the focus-shift target. ' +
`Underlying: ${msg}`,
);
return;
}
// `focusOtherWindow` queries `niri msg --json windows`
// once and throws if there are zero matches; only the
// post-focus focused-window verification has its own
// retry. So we need a brief readiness poll for the
// marker window to actually appear in the niri window
// list before we attempt the focus shift — and the focus
// shift itself must eventually succeed within the budget.
//
// We capture the LAST error (rather than rethrowing on
// the first) so the diagnostic carries the real cause if
// every attempt fails. NiriIpcUnavailable / FootUnavailable
// are sticky — they won't change between retries — so we
// short-circuit out on the first occurrence and skip.
let focusOk = false;
let lastFocusErr: unknown = null;
let earlySkipReason: string | null = null;
const focusBudgetMs = 5_000;
const focusStart = Date.now();
while (Date.now() - focusStart < focusBudgetMs) {
try {
await focusOtherWindow(markerTitle);
focusOk = true;
break;
} catch (err) {
lastFocusErr = err;
if (err instanceof NiriIpcUnavailable) {
earlySkipReason =
'NiriIpcUnavailable on a row that was ' +
'supposed to be Niri-gated. Check NIRI_SOCKET / ' +
'`niri msg` availability.';
break;
}
if (err instanceof FootUnavailable) {
earlySkipReason =
'foot not installed; required for the ' +
'focus-shift step. ' +
(err instanceof Error ? err.message : String(err));
break;
}
// "no window matches" (marker not yet listed by
// niri) or "focus-window action did not stick" —
// both can resolve on retry. Brief pause then loop.
await sleep(100);
}
}
if (earlySkipReason) {
diag.focusError =
lastFocusErr instanceof Error
? lastFocusErr.message
: String(lastFocusErr);
await attachDiag();
testInfo.skip(true, earlySkipReason);
return;
}
if (!focusOk) {
const msg =
lastFocusErr instanceof Error
? lastFocusErr.message
: String(lastFocusErr);
diag.focusError = msg;
diag.launcherLogTail = await readLauncherLog();
await attachDiag();
throw new Error(
`focusOtherWindow failed within ${focusBudgetMs}ms: ${msg}`,
);
}
// At this point focus is on the marker foot. Capture the
// post-focus focused-window id — should equal the
// marker's id, not Claude's. (We don't have a clean way
// to fetch the marker's id independently here without
// re-running `niri msg`; the value-vs-pre comparison in
// the diagnostic is sufficient evidence of the shift.)
diag.activeWidAfterFocus = await getFocusedWindowId();
// Now press the global shortcut. The whole point of S14:
// even though the marker foot holds focus (and Claude
// does not), the portal-routed BindShortcuts grab should
// fire the popup. Currently known-failing per case-doc
// S14 (`Failed to call BindShortcuts (error code 5)`).
try {
await qe.openAndWaitReady();
} catch (err) {
diag.openError = err instanceof Error ? err.message : String(err);
diag.popupState = await qe.getPopupState();
diag.launcherLogTail = await readLauncherLog();
await attachDiag();
throw err;
}
const popupState = await qe.getPopupState();
diag.popupState = popupState;
diag.launcherLogTail = await readLauncherLog();
await attachDiag();
// Single critical assertion: popup exists AND is visible
// after the shortcut press from non-Claude focus. A null
// state means the BrowserWindow was never constructed —
// the portal grab didn't fire. visible === false means
// it constructed but show() was suppressed (the upstream
// lHn() short-circuit, or a regression in the visibility
// flow). Either is a fail for S14's contract.
expect(
popupState && popupState.visible,
'Quick Entry popup is visible after shortcut press from ' +
'non-Claude focus (Niri Wayland path)',
).toBe(true);
} finally {
// Marker foot cleanup is idempotent. Always run before
// app.close() so the kill happens even if the spec
// throws between the two.
if (marker) {
await marker.kill().catch(() => {
// best-effort — process may already be dead
});
}
await app.close();
}
});

View File

@@ -0,0 +1,367 @@
import { test, expect } from '@playwright/test';
import { spawn } from 'node:child_process';
import { existsSync, statSync } from 'node:fs';
import { mkdtemp, open, readdir, rm } from 'node:fs/promises';
import { tmpdir } from 'node:os';
import { join } from 'node:path';
// S15 — AppImage `--appimage-extract` fallback works as documented.
//
// Per docs/testing/cases/distribution.md S15: on FUSE-less hosts the
// AppImage runtime ships an extract fallback. Running the AppImage
// with `--appimage-extract` should drop a `squashfs-root/` next to
// CWD with a working `AppRun` inside, runnable without FUSE. The
// case-doc anchors point at scripts/packaging/appimage.sh:282/:312
// (built with stock `appimagetool`, which always supports
// `--appimage-extract`) and the AppRun script at
// scripts/packaging/appimage.sh:70-118; CI exercises the same path
// (tests/test-artifact-appimage.sh:36-44).
//
// Assertion shape:
// 1. Locate an AppImage. Skip cleanly if not running from one.
// 2. mkdtemp a work dir, spawn `<AppImage> --appimage-extract` with
// that dir as CWD. Assert exit 0.
// 3. Assert `squashfs-root/AppRun` exists.
// 4. Spawn `squashfs-root/AppRun --version` with a 5s timeout. The
// case-doc accepts "exit 0 or doesn't immediately fail" — we
// treat anything that didn't crash with a FUSE/dlopen error
// within the window as a pass; clean exit 0 is the strongest
// signal.
// 5. rm the extracted tree in `finally`.
//
// AppImage detection mirrors S01's inline probe (probe
// CLAUDE_DESKTOP_LAUNCHER, fall back to <repo>/test-build/*.AppImage,
// verify ELF magic + AppImage type marker). Inline rather than
// extracted to a shared lib — only two callers today, and the
// canary-style runners benefit from being decoupled from moving
// helper surfaces.
interface AppImageProbeResult {
path: string | null;
reason: string;
}
// AppImages are ELF executables containing a squashfs image with a
// magic header at offset 8: `AI\x02` for type 2 (the format our build
// emits) or `AI\x01` for type 1.
async function probeAppImagePath(): Promise<AppImageProbeResult> {
const explicit = process.env.CLAUDE_DESKTOP_LAUNCHER;
const candidates: string[] = [];
if (explicit) candidates.push(explicit);
const projectRoot = '/home/aaddrick/source/claude-desktop-debian';
const testBuildDir = `${projectRoot}/test-build`;
if (existsSync(testBuildDir)) {
try {
const entries = await readdir(testBuildDir);
for (const entry of entries) {
if (entry.endsWith('.AppImage')) {
candidates.push(`${testBuildDir}/${entry}`);
}
}
} catch {
// best-effort
}
}
for (const candidate of candidates) {
if (!existsSync(candidate)) continue;
try {
const st = statSync(candidate);
if (!st.isFile()) continue;
if (candidate.endsWith('.AppImage')) {
return { path: candidate, reason: 'matched .AppImage suffix' };
}
const fh = await open(candidate, 'r');
try {
const buf = Buffer.alloc(12);
await fh.read(buf, 0, 12, 0);
const elf = buf.subarray(0, 4).toString('hex') === '7f454c46';
const aiMagic = buf.subarray(8, 11);
const isAppImage =
elf &&
aiMagic[0] === 0x41 &&
aiMagic[1] === 0x49 &&
(aiMagic[2] === 0x01 || aiMagic[2] === 0x02);
if (isAppImage) {
return {
path: candidate,
reason: 'matched AppImage magic bytes',
};
}
} finally {
await fh.close();
}
} catch {
// fall through to next candidate
}
}
return {
path: null,
reason:
'no AppImage found via CLAUDE_DESKTOP_LAUNCHER or ' +
`${testBuildDir}/*.AppImage`,
};
}
interface SpawnResult {
exitCode: number | null;
signalCode: NodeJS.Signals | null;
stdout: string;
stderr: string;
timedOut: boolean;
elapsedMs: number;
}
async function runWithTimeout(
cmd: string,
args: string[],
cwd: string,
timeoutMs: number,
): Promise<SpawnResult> {
const start = Date.now();
const proc = spawn(cmd, args, {
cwd,
env: process.env,
stdio: ['ignore', 'pipe', 'pipe'],
detached: false,
});
const stdoutChunks: Buffer[] = [];
const stderrChunks: Buffer[] = [];
proc.stdout?.on('data', (c: Buffer) => stdoutChunks.push(c));
proc.stderr?.on('data', (c: Buffer) => stderrChunks.push(c));
let exitCode: number | null = null;
let signalCode: NodeJS.Signals | null = null;
let timedOut = false;
await Promise.race([
new Promise<void>((resolve) => {
proc.once('exit', (code, signal) => {
exitCode = code;
signalCode = signal;
resolve();
});
}),
new Promise<void>((resolve) => {
setTimeout(() => {
timedOut = true;
resolve();
}, timeoutMs);
}),
]);
if (proc.exitCode === null && proc.signalCode === null) {
proc.kill('SIGTERM');
await Promise.race([
new Promise<void>((resolve) =>
proc.once('exit', (code, signal) => {
exitCode = code;
signalCode = signal;
resolve();
}),
),
new Promise<void>((resolve) => setTimeout(resolve, 2_000)),
]);
if (proc.exitCode === null && proc.signalCode === null) {
proc.kill('SIGKILL');
await new Promise<void>((resolve) => {
proc.once('exit', (code, signal) => {
exitCode = code;
signalCode = signal;
resolve();
});
setTimeout(() => resolve(), 1_000);
});
}
}
return {
exitCode,
signalCode,
stdout: Buffer.concat(stdoutChunks).toString('utf8'),
stderr: Buffer.concat(stderrChunks).toString('utf8'),
timedOut,
elapsedMs: Date.now() - start,
};
}
function tail(s: string, n: number): string {
if (s.length <= n) return s;
return s.slice(-n);
}
test.setTimeout(60_000);
test('S15 — AppImage --appimage-extract fallback works', async ({}, testInfo) => {
// Case-doc S15 lists Severity: Could. Surface label is the harness
// taxonomy ("Distribution / AppImage extract") rather than the
// case-doc's free-text "AppImage runtime / FUSE-less fallback".
testInfo.annotations.push({ type: 'severity', description: 'Could' });
testInfo.annotations.push({
type: 'surface',
description: 'Distribution / AppImage extract',
});
const probe = await probeAppImagePath();
await testInfo.attach('appimage-probe', {
body: JSON.stringify(probe, null, 2),
contentType: 'application/json',
});
if (!probe.path) {
test.skip(true, `S15 only applies to AppImage installs: ${probe.reason}`);
return;
}
const appImagePath = probe.path;
await testInfo.attach('appimage-path', {
body: appImagePath,
contentType: 'text/plain',
});
// mkdtemp so the extract tree lands in $TMPDIR, not the harness
// CWD. `--appimage-extract` writes `squashfs-root/` relative to
// CWD, so we just spawn with cwd = the temp dir.
const extractDir = await mkdtemp(join(tmpdir(), 'claude-s15-'));
const squashRoot = join(extractDir, 'squashfs-root');
const appRun = join(squashRoot, 'AppRun');
await testInfo.attach('extract-dir', {
body: extractDir,
contentType: 'text/plain',
});
try {
// Step 1: extraction. 30s budget — extracting ~200MB of
// squashfs to disk is well under that on any modern host.
const extract = await runWithTimeout(
appImagePath,
['--appimage-extract'],
extractDir,
30_000,
);
await testInfo.attach('extract-exit', {
body: JSON.stringify(
{
exitCode: extract.exitCode,
signalCode: extract.signalCode,
timedOut: extract.timedOut,
elapsedMs: extract.elapsedMs,
},
null,
2,
),
contentType: 'application/json',
});
await testInfo.attach('extract-stderr-tail-4k', {
body: tail(extract.stderr, 4096) || '(empty)',
contentType: 'text/plain',
});
await testInfo.attach('extract-stdout-tail-4k', {
body: tail(extract.stdout, 4096) || '(empty)',
contentType: 'text/plain',
});
expect(
extract.exitCode,
`AppImage --appimage-extract should exit 0 ` +
`(stderr tail: ${tail(extract.stderr, 256)})`,
).toBe(0);
expect(
extract.signalCode,
'extraction process should not be killed by signal',
).toBe(null);
// Step 2: assert squashfs-root/AppRun exists.
const appRunExists = existsSync(appRun);
await testInfo.attach('apprun-exists', {
body: JSON.stringify(
{
path: appRun,
exists: appRunExists,
squashfsRootExists: existsSync(squashRoot),
},
null,
2,
),
contentType: 'application/json',
});
expect(
appRunExists,
`squashfs-root/AppRun should exist after extract at ${appRun}`,
).toBe(true);
// Step 3: spawn `AppRun --version` with a 5s timeout. AppRun
// is a wrapper script (scripts/packaging/appimage.sh:70-118)
// that hands off to the real Electron entry — `--version`
// is the cheapest probe that exercises the full launch path
// without bringing up a window. The case-doc accepts "exit 0
// or doesn't immediately fail"; a clean exit 0 is best, but
// we also flag obvious FUSE / dlopen errors as failures.
const apprun = await runWithTimeout(
appRun,
['--version'],
squashRoot,
5_000,
);
await testInfo.attach('apprun-exit', {
body: JSON.stringify(
{
exitCode: apprun.exitCode,
signalCode: apprun.signalCode,
timedOut: apprun.timedOut,
elapsedMs: apprun.elapsedMs,
},
null,
2,
),
contentType: 'application/json',
});
await testInfo.attach('apprun-stderr-tail-4k', {
body: tail(apprun.stderr, 4096) || '(empty)',
contentType: 'text/plain',
});
await testInfo.attach('apprun-stdout-tail-4k', {
body: tail(apprun.stdout, 4096) || '(empty)',
contentType: 'text/plain',
});
// Hard fail on the cardinal "didn't run at all" patterns: a
// FUSE / dlopen complaint here would mean the extract path
// ALSO depends on FUSE (which would defeat its purpose).
const stderrLower = apprun.stderr.toLowerCase();
const fuseFailure =
stderrLower.includes('libfuse.so.2') ||
(stderrLower.includes('dlopen') && stderrLower.includes('fuse'));
expect(
fuseFailure,
`AppRun --version stderr should not show a FUSE/dlopen ` +
`failure (the extract fallback exists precisely to avoid ` +
`FUSE). stderr tail: ${tail(apprun.stderr, 256)}`,
).toBe(false);
// Soft acceptance: exit 0 is canonical, but Electron's
// `--version` printer can occasionally exit non-zero on Linux
// when accessory subsystems (sandbox, dbus) are missing while
// still printing the version. Accept exit 0 OR (timed-out
// while still alive AND stdout shows a version string).
const versionLooksOk =
/\d+\.\d+\.\d+/.test(apprun.stdout) ||
/\d+\.\d+\.\d+/.test(apprun.stderr);
const acceptableNonZero = apprun.timedOut && versionLooksOk;
expect(
apprun.exitCode === 0 || acceptableNonZero,
`AppRun --version should exit 0 or print a version before ` +
`timeout. exit=${apprun.exitCode} signal=${apprun.signalCode} ` +
`timedOut=${apprun.timedOut} ` +
`stdoutHasVersion=${versionLooksOk}`,
).toBe(true);
} finally {
await rm(extractDir, { recursive: true, force: true }).catch(() => {});
}
});

Some files were not shown because too many files have changed in this diff Show More