Files
claude-desktop-debian/tools/test-harness/README.md
Aaddrick 3506c14918 test(harness): add Linux compatibility test harness (#579)
Build out a Playwright-based regression-detection harness covering
the compat-matrix surfaces (KDE-W, KDE-X, GNOME, Sway, i3, Niri,
packaging formats). Adds:

- Planning + decision docs under docs/testing/ — README, matrix,
  runbook, automation, cases/ (11 case files), quick-entry-closeout
- Playwright scaffolding (config, tsconfig)
- 78 spec runners under tools/test-harness/src/runners/ — T## case-
  doc runners and S## distribution/smoke runners
- Substrate primitives in tools/test-harness/src/lib/: AX-tree
  loader (snapshotAx + waitForAxNode + axTreeToSnapshot), focus-
  shifter, eipc-registry, niri-native bridge, drag-drop bridge,
  electron-mocks, claudeai page-objects, inspector client

S03 (DEB Depends declared) and S04 (RPM Requires declared) ship
marked test.fail() — they're regression detectors for the case-doc
gap (deb.sh emits no Depends:, rpm.sh sets AutoReqProv: no), and
the expected-failure shape lets them report green on every host
until upstream packaging starts declaring runtime deps.

127 files, no runtime changes; harness is opt-in via
'cd tools/test-harness && npx playwright test'.

Co-authored-by: Claude <claude@anthropic.com>
2026-05-04 23:17:37 -04:00

41 KiB
Raw Blame History

Linux Compatibility Test Harness

In-VM (or on-host) Playwright + DBus runner for the test cases under docs/testing/cases/. See docs/testing/automation.md for the architecture, decisions, and rationale.

Status

Seventy-four specs wired (36 cross-env T-tests, 33 env-specific S-tests, 5 H-prefix harness self-tests).

Test What it checks Layer
T01 X11 window with our pid appears within 15s; title matches /claude/i L2 (xprop)
T02 claude-desktop --doctor exits 0 spawn probe
T03 A StatusNotifierItem is registered by the claude-desktop pid AND exactly one (no rebuild-race duplicates) L2 (DBus)
T04 Window has _NET_FRAME_EXTENTS (sum > 0) and a "Claude" title L2 (xprop)
T05 xdg-open 'claude://...' delivers via app.on('second-instance') to the running app spawn + L1 hook
T06 globalShortcut.isRegistered('Ctrl+Alt+Space') returns true after mainVisible L1
T07 Five topbar buttons render with non-zero rects (uses seedFromHost for hermetic auth) L1 + DOM
T08 win.close() fires the wrapper interceptor; window hidden, proc alive L1
T09 setLoginItemSettings({ openAtLogin }) writes/removes $XDG_CONFIG_HOME/autostart/claude-desktop.desktop L1 + filesystem
T10 After H04-style spawn detection, kill -9 the daemon and confirm a different pid respawns within ~20s (Patch 6 cooldown + retry) pgrep delta + spawn delta
T11 Plugin-install code path fingerprints present in bundled index.js file probe
T11_runtime After seedFromHost + userLoaded, the install-flow eipc surface (installPlugin, uninstallPlugin, updatePlugin, listInstalledPlugins, LocalPlugins/getPlugins — five-suffix presence probe) is registered on the claude.ai webContents AND BOTH read-side handlers across the two impl objects are callable through the renderer-side wrapper: CustomPlugins/listInstalledPlugins([]) returns array shape (drives Manage plugins panel), LocalPlugins/getPlugins() returns array shape (reads ~/.claude/plugins/installed_plugins.json per case-doc :465822) — Tier 2 reframe of T11 (case-doc anchor :507181) L1 (eipc registry + invoke)
T12 app.getGPUFeatureStatus() returns a populated object; renderer reached visible L1
T13 --doctor does not false-flag rpm/deb installs as missing-dpkg AppImage spawn + stdout grep
T14a requestSingleInstanceLock + 'second-instance' strings in bundled index.js (file probe) file probe
T14b Second invocation under same isolation exits cleanly; primary pid stays alive (runtime probe) spawn delta + pgrep
T16 After seedFromHost + userLoaded, CodeTab.activate() resolves and ≥1 compact pill renders (env pill = Code-body mounted) L1 + AX-tree
T17 After seedFromHost + userLoaded, Code df-pill → env pill → Local → Select folder → Open folder triggers dialog.showOpenDialog (mock installed via installOpenDialogMock); skips cleanly when host has no signed-in Claude config L1 + AX-tree
T18 Bundled mainView.js preload contains the path-resolution bridge fingerprints: getPathForFile (2× — property key + the webUtils.getPathForFile( call, both at case-doc :9267), webUtils, filePickers, and the claudeAppSettings contextBridge.exposeInMainWorld namespace (case-doc :9552) — pins the load-bearing wiring without faking OS-level XDND drag (xdotool can't put file URIs on the X11 selection; Wayland needs per-compositor IPC + libei) file probe
T19 After seedFromHost + userLoaded, the integrated-terminal eipc surface (startShellPty, writeShellPty, stopShellPty, resizeShellPty, getShellPtyBuffer — five-suffix presence probe) is registered on the claude.ai webContents AND the foundational LocalSessions/getAll returns array shape (Tier 2 reframe of the case-doc T19 case; case-doc anchors are write-side startShellPty etc. so reframe asserts the FULL terminal IPC surface registers + a stateless read-side surrogate is invocable) L1 (eipc registry + invoke)
T20 After seedFromHost + userLoaded, the file-pane eipc surface (readSessionFile, writeSessionFile, pickSessionFile — three-suffix presence probe) is registered on the claude.ai webContents AND the foundational LocalSessions/getAll returns array shape (Tier 2 reframe of the case-doc T20 case; the case-doc's readSessionFile anchor is read-side but needs (sessionId, path) args not constructible from a fresh isolation, so the registration probe + foundational getAll invocation is the strongest non-destructive Tier 2 layer) L1 (eipc registry + invoke)
T21 After seedFromHost + userLoaded, the preview-pane eipc surface (getConfiguredServices, startFromConfig, stopServer, getAutoVerify, capturePreviewScreenshot — five-suffix presence probe) is registered on the claude.ai webContents AND BOTH case-doc-anchored read-side handlers are callable through the renderer-side wrapper: getConfiguredServices(cwd) returns array shape, getAutoVerify(cwd) returns boolean shape (Tier 2 reframe of the case-doc T21 case; cwd validator is typeof cwd === 'string' only, smoke-tested session 11) L1 (eipc registry + invoke)
T22 Bundled index.js contains LocalSessions_$_getPrChecks eipc channel name and gh CLI not found in PATH Linux-fallthrough throw site (Tier 1 fingerprint) file probe
T22b After seedFromHost + userLoaded, the LocalSessions_$_getPrChecks eipc handler is registered on the claude.ai webContents (webContents.ipc._invokeHandlers — Tier 2 runtime probe sibling of T22, strictly stronger than the bundle-string fingerprint) L1 (eipc registry)
T23 Firing new Notification({title}) from main reaches the session bus's org.freedesktop.Notifications.Notify (observed via dbus-monitor) L1 + DBus subprocess
T24 After installOpenExternalMock mirroring T25's pattern, evalInMain calls shell.openExternal('vscode://file/...'); mock records the URL verbatim, no real editor launch L1 (mocked egress)
T25 After installShowItemInFolderMock mirroring T17's dialog-mock pattern, evalInMain calls shell.showItemInFolder(<synthetic path>); mock records the call verbatim, no throw — no host side effect L1 (mocked egress)
T26 After seedFromHost + userLoaded, click "Routines" sidebar AX button; assert "New routine" / "All" / "Calendar" anchor renders L1 + AX-tree
T27 After seedFromHost + userLoaded, both Cowork and CCD getAllScheduledTasks eipc handlers are registered AND callable through the renderer-side wrapper, returning array shape — Tier 2 reframe of the case-doc T27 case L1 (eipc invoke)
T30 Bundled index.js colocates the auto-archive sweep cadence (300*1e33600*1e3AutoArchiveEngine) with the ccAutoArchiveOnPrClose gate key (single-regex multi-string fingerprint) file probe
T31 Bundled index.js contains all three side-chat eipc channel names (startSideChat, sendSideChatMessage, stopSideChat) — load-bearing trio file probe
T31b After seedFromHost + userLoaded, all three side-chat eipc handlers (startSideChat, sendSideChatMessage, stopSideChat) are registered on the claude.ai webContents — load-bearing trio (Tier 2 runtime sibling of T31) L1 (eipc registry)
T32 Bundled index.js contains LocalSessions_$_getSupportedCommands eipc channel + slashCommands schema field file probe
T33 Bundled index.js contains CustomPlugins_$_listMarketplaces and CustomPlugins_$_listAvailablePlugins eipc channel names (browser populate flow) file probe
T33b After seedFromHost + userLoaded, both plugin-browser eipc handlers (listMarketplaces, listAvailablePlugins) are registered on the claude.ai webContents — load-bearing pair (Tier 2 runtime sibling of T33) L1 (eipc registry)
T33c After seedFromHost + userLoaded, both plugin-browser eipc handlers (listMarketplaces, listAvailablePlugins) are callable through the renderer-side wrapper with args = [[]] (empty egressAllowedDomains), each returning array shape — Tier 2 invocation upgrade of T33b, strictly stronger than registration alone L1 (eipc invoke)
T35 Bundled index.js contains the four-needle MCP-config separation fingerprint: claude_desktop_config.json (chat-tab path), .claude.json + .mcp.json (Code-tab loaders), "user","project","local" (settingSources triple Code-session passes to the agent SDK) — pins per-tab separation without launch file probe
T35b After seedFromHost + userLoaded, the claude.settings/MCP/getMcpServersConfig eipc handler is registered AND callable through the renderer-side wrapper, returning a non-array object (Tier 2 runtime sibling of T35, strictly stronger than the bundle-string fingerprint) L1 (eipc invoke)
T36 Bundled index.js contains the hooks runtime fingerprint: hook_started / hook_progress / hook_response (single-occurrence Verbose-transcript runtime emits) plus PreToolUse / UserPromptSubmit registry tokens — pins the runtime hook-fire path the case-doc Verbose-transcript claim hangs on file probe
T37 Bundled index.js contains [GlobalMemory] Copied CLAUDE.md log line + CLAUDE.md filename literal + CLAUDE_CONFIG_DIR env-var token (memory-loading wiring) file probe
T37b After seedFromHost + userLoaded, the claude.web/CoworkMemory/readGlobalMemory eipc handler is registered AND callable through the renderer-side wrapper, returning the documented string | null shape (Tier 2 runtime sibling of T37) L1 (eipc invoke)
T38 Bundled index.js contains LocalSessions_$_openInEditor eipc channel name (Tier 1 fingerprint) file probe
T38b After seedFromHost + userLoaded, the LocalSessions_$_openInEditor eipc handler is registered on the claude.ai webContents (Tier 2 runtime sibling of T38) L1 (eipc registry)
H01 CDP auth gate exits with code 1 when spawned with --remote-debugging-port and no CLAUDE_CDP_AUTH token spawn probe
H02 frame-fix-wrapper.js + frame-fix-entry.js injected into app.asar (Proxy + main-field reference) file probe
H03 Build-pipeline patch fingerprints all present in app.asar (KDE gate, frame-fix inject, tray, cowork, claude-code) file probe
H04 cowork daemon spawns under app and exits with app — soft-skips on rows where it isn't gated to spawn pgrep delta
H05 UI-drift canary against the AX-tree fingerprint walker (requires CLAUDE_TEST_USE_HOST_CONFIG=1) L1 (AX)
S01 AppImage launches without libfuse.so.2 complaint (skips on non-AppImage rows) spawn + stderr grep
S02 No strict == equality against XDG_CURRENT_DESKTOP in launcher / patches (regression detector) source-tree probe
S03 dpkg-query Depends: field non-empty (currently fails as upstream-contract regression detector) dpkg-query
S04 rpm -qR has at least one non-rpmlib(...) requirement (currently fails per #autoreqprov off) rpm -qR
S05 Doctor does not false-flag rpm-installed package (skips when rpm -qf doesn't claim the binary) spawn + stdout grep
S07 Under CLAUDE_HARNESS_USE_WAYLAND=1, spawned Electron has --ozone-platform=wayland on argv argv probe
S08 setImage-based in-place fast-path injected by tray.sh (KDE-only, file probe) file probe
S09 KDE-gate string present in bundled index.js (patch ran at build) file probe
S10 KDE-W only — popup runtime getBackgroundColor() === '#00000000' after Quick Entry opens (regression-detector against electron#50213 if bundled Electron in 41.0.4-bisect-window) L1 + ydotool
S11 GNOME-X / Ubu-X only (X11-side regression detector) — spawn xterm marker, xdotool windowfocus to it, verify _NET_ACTIVE_WINDOW shifted, fire Ctrl+Alt+Space via ydotool, assert popup visible. Wayland-side mutter regression (#404) is a primitive gap — needs Wayland-native focus injection (libei) L1 + xdotool focus + ydotool shortcut
S12 --enable-features=GlobalShortcutsPortal in Electron argv (GNOME-W only — currently a known-failing regression detector) argv probe
S14 Niri only — spawn foot marker, niri msg action focus-window to it, verify niri msg --json focused-window shifted, fire Ctrl+Alt+Space via ydotool, assert popup visible. Currently known-failing detector for the Niri portal BindShortcuts path (parallels S12's GNOME-W detector) L1 + niri msg focus + ydotool shortcut
S15 --appimage-extract exits 0; squashfs-root/AppRun --version runs without FUSE error spawn + filesystem
S16 mount(8) shows new .mount_claude while app is up; gone within 10s of close mount delta
S17 Shell-path-worker overlays user's login-shell PATH onto a deliberately-scrubbed env L1 + utilityProcess
S19 extraEnv: { CLAUDE_CONFIG_DIR } reaches main-process process.env; cE()-equivalent resolves under the override path L1 + extraEnv
S21 No handle-lid-switch / HandleLidSwitch strings in bundle (lid policy deferred to OS) asar absence probe
S22 new Set(["darwin","win32"]) platform gate present; no 2-element Set pairing linux (file-probe form) asar regex
S25 safeStorage.encryptString → file → app restart → file → safeStorage.decryptString round-trips the same plaintext (skips when isEncryptionAvailable === false) L1 + shared isolation handle
S26 setFeedURL present + project suppression marker present (currently fails — gated on #567) asar fingerprint
S27 installed_plugins.json + homedir resolver present; no */plugins system paths in bundle asar fingerprint
S28 Bundled index.js contains the worktree permission classifier expression ("Permission denied" || "Access is denied" || "could not lock config file" → "permission-denied") plus the Failed to create git worktree: log line asar fingerprint
S29 Popup opens when main is hidden-to-tray (lazy-create sanity) L1
S30 No new claude-desktop pid spawns after post-exit shortcut press pgrep delta + ydotool
S31 Submit reaches new chat from visible / minimized / hidden-to-tray (QE-7/8/9) L1 + ydotool
S32 GNOME mutter stale-isFocused() regression (GNOME-W/Ubu-W only — known-failing today) L1 + ydotool
S33 Captures bundled Electron version against the #370 / electron#50213 bisect threshold file read
S34 Popup does not appear when main is fullscreen (upstream contract) L1 + ydotool
S35 Popup position persists across invocations and across app restart (two-launch test) L1 + shared isolation handle + ydotool
S36 Multi-monitor fallback — skip-on-single-monitor with documented fixme for the disconnect orchestration display probe
S37 Main-window destroy unreachable on Linux per close-to-tray override — documented skip

These specs exercise the substrate primitives in lib/: xprop shell-outs (T01, T04), dbus-next (T03), dbus-monitor subprocess eavesdrop (T23), Node-inspector runtime-attach (T07/T16/T17/T26/S10/S29-S35/T05-T14b L1 specs), app.asar content reads (S08/S09/S21/S22/S26/S27/S28/T11/T14a/T18/T22/T30/T31/T32/T33/T35/T36/T37/T38/H02/H03/S33 — mostly index.js; T18 reads mainView.js), /proc/$pid/cmdline reads (S07/S12), pgrep-based pid deltas (T10/T14b/H04/S16/S30), mount(8) parsing (S16), source-tree probes against scripts/launcher-common.sh (S02), dpkg-query / rpm -qR / rpm -qf calls (S03/S04/S05/T13), safeStorage.encryptString round-trip across two launches (S25), extraEnv precedence over isolation env (S19), the lib/electron-mocks.ts mock-then-call helpers — installOpenDialogMock (T17), installShowItemInFolderMock (T25), installOpenExternalMock (T24) — the lib/input.ts focus-shifter (focusOtherWindow + spawnMarkerWindow for S11; X11 only — WaylandFocusUnavailable thrown on native Wayland) and its Niri-native sibling lib/input-niri.ts (niri msg --json for the focus-injection + readback chain, foot --title for the marker window; NiriIpcUnavailable thrown off-Niri; consumed by S14), the lib/eipc.ts registry walker (getEipcChannels / waitForEipcChannel / waitForEipcChannels against webContents.ipc._invokeHandlers; opaque on the UUID, suffix-matched against case-doc anchors; consumed by T19 / T20 / T22b / T31b / T33b / T38b) plus its session 8 invoke surface (invokeEipcChannel — calls a registered handler through the renderer-side wrapper at window['claude.<scope>'].<Iface>.<method>; consumed by T19 / T20 / T27 / T33c / T35b / T37b), the lib/ax.ts AX-tree substrate (snapshotAx for one-shot reads + waitForAxNode / waitForAxNodes for predicate-based polling, plus re-exports of RawElement / AxNode / axTreeToSnapshot / waitForAxTreeStable from explore/walker.ts so consumers stay inside lib/; threshold- driven extraction in session 13 once T26 had to duplicate the formerly-private snapshotAx from claudeai.ts; consumed by claudeai.ts page-objects + T26; session 14 migrated activateTab from a one-shot snapshot to waitForAxNode polling — fixes the T16 no AX-tree button with accessibleName="Code" found failure mode where the Code button hadn't rendered yet at click time — and converted CodeTab.activate's post-click findCompactPills retry loop to waitForAxNodes) — and the createIsolation({ seedFromHost: true }) primitive that lets login- required tests run hermetically against a copy of the host's signed- in auth state (T07, T11_runtime, T16, T17, T19, T20, T21, T22b, T26, T27, T31b, T33b, T33c, T35b, T37b, T38b — session 15 migrated T17 from the legacy CLAUDE_TEST_USE_HOST_CONFIG=1 / isolation: null shape to seedFromHost, fixing a pre-existing 60s spec-timeout flake where the unauth'd default isolation polled userLoaded past Playwright's spec budget; session 16 verified the migration end-to- end — seedFromHost clones the host's signed-in config, waitForReady('userLoaded') resolves to a post-login URL, and the session-14 CodeTab.activate({ timeout: 15_000 }) succeeds; T17 now reaches a NEW failure mode at the next chain step (openFolderPicker after selectLocal, Select folder… pill doesn't render on /epitaxy workspace route — likely needs /new context, deferred for a future session).

Note on eipc channels: the LocalSessions_$_* and CustomPlugins_$_* channel names referenced in the case-doc Code anchors don't register through Electron's global ipcMain.handle() registry (which only carries 3 chat-tab MCP-bridge handlers). They DO register through Electron's stdlib IpcMainImpl — just on the per-webContents IPC scope (webContents.ipc._invokeHandlers, Electron 17+) rather than the global one. The framing is $eipc_message$_<UUID>_$_<scope>_$_<iface>_$_<method> (UUID stable across builds at c0eed8c9-…); 117 LocalSessions_* + 16 CustomPlugins_* + 50+ other interfaces register on the claude.ai webContents. T22 / T31 / T33 / T38 ship as Tier 1 fingerprints against the bundled channel-name strings; T22b / T31b / T33b / T38b are the runtime registry-presence siblings (strictly stronger, require seedFromHost). T27 / T33c / T35b / T37b go one step further — they invoke the resolved handlers through the renderer- side wrapper at window['claude.<scope>'].<Iface>.<method>. T19 / T20 are first-runtime-probe siblings of case-doc tests whose anchors are write-side handlers (startShellPty / writeSessionFile); they ship a five-suffix / three-suffix registration probe over the case-doc-anchored write-side surface plus a single foundational read-side LocalSessions/getAll invocation as the read-side surrogate (case-doc connection: integrated terminal and file pane both bind to LocalSessions; getAll proves the LocalSessions impl object is reachable through the renderer wrapper). T21 and T11_runtime extend the dual-invocation pattern: when a case-doc has read-side anchors with resolvable arg shapes, invoke the case-doc- anchored handlers directly rather than through a foundational surrogate (T21: getConfiguredServices array + getAutoVerify boolean on a single Launch impl object; T11_runtime: cross-impl- object dual invocation — CustomPlugins/listInstalledPlugins array

  • LocalPlugins/getPlugins array — proves the install plumbing crosses both interfaces intact, strictly stronger than single- interface coverage). All wrapper invocations use the wrapper exposed by mainView.js via contextBridge.exposeInMainWorld after a top-frame + origin gate (Qc(): claude.ai / claude.com / preview.* / localhost). Calling through the wrapper carries an honest senderFrame for the inlined le() / Vi() per-handler origin gate, so the test surface matches real attack surface. T33c also demonstrates the schema-rev path: when invocation rejects with Argument "<name>" at position N ... failed to pass validation, the verbatim rejection string is the cheapest grep target back to the inline hand-rolled validator block (bundle bytes 5013601 / 5018821 for the two CustomPlugins methods). See lib/eipc.ts for both surfaces.

Per-row pass/skip counts depend on which sweep runs against the row. The Quick Entry runners (S29-S35) all share the same primitive set (installInterceptor() + openAndWaitReady() + scenario-specific state setup).

Prerequisites

On the host or VM running the sweep:

  • Node.js ≥ 20
  • claude-desktop installed (deb / rpm / AppImage), reachable via claude-desktop on PATH or CLAUDE_DESKTOP_LAUNCHER env var
  • xprop (for L2 window queries — dnf install xorg-x11-utils on Fedora; apt install x11-utils on Debian/Ubuntu)
  • zstd (optional — used to bundle results)

Quick Entry runners (S29S37, future QE-*)

Quick Entry tests inject the OS-level shortcut via ydotool / /dev/uinput. One-time setup per host or VM:

# Install the binary + daemon
sudo dnf install -y ydotool   # or: sudo apt install ydotool

# Make ydotoold's socket world-writable so the test runner reaches it
sudo mkdir -p /etc/systemd/system/ydotool.service.d
sudo tee /etc/systemd/system/ydotool.service.d/override.conf <<'EOF'
[Service]
ExecStart=
ExecStart=/usr/bin/ydotoold --socket-perm=0666
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now ydotool.service

After this, ydotool key 29:1 29:0 (Ctrl tap) should exit 0. The runner sets YDOTOOL_SOCKET=/tmp/.ydotool_socket automatically; override the env var if your daemon binds elsewhere.

ydotool cannot drive portal-grabbed shortcuts (kernel uinput events vs compositor portal grabs) — those tests stay manual until libei adoption broadens. See docs/testing/automation.md.

Install

cd tools/test-harness
npm install

package-lock.json is gitignored for now; commit it once the dep set is settled.

Run

# All four tests against the locally installed claude-desktop
ROW=KDE-W ./orchestrator/sweep.sh

# Single test
npx playwright test src/runners/T01_app_launch.spec.ts

# Headed (watch the app launch in front of you)
npx playwright test --headed

# Run the full suite under native Wayland instead of X11/XWayland
CLAUDE_HARNESS_USE_WAYLAND=1 npm test

# Grounding probe — dump runtime state for the case-doc grounding sweep
npm run grounding-probe -- --launch --include-synthetic \
  --out ../../docs/testing/cases-grounding-runtime.json

Results land at results/results-${ROW}-${DATE}/:

results/results-KDE-W-20260430T143000Z/
├── junit.xml             # JUnit summary (matrix-regen input)
├── html/                 # Playwright HTML report
└── test-output/          # Per-test attachments (screenshots, logs, etc.)

A bundled results-${ROW}-${DATE}.tar.zst sits next to the dir if zstd is installed.

Environment variables

Var Default Purpose
ROW KDE-W Matrix row label, propagated into the bundle name and per-test annotations. Drives skipUnlessRow() in spec files
CLAUDE_DESKTOP_LAUNCHER claude-desktop (PATH lookup) Path to the launcher / Electron binary Playwright spawns
CLAUDE_DESKTOP_ELECTRON probed Override the resolved Electron binary path (skips deb/rpm install probing)
CLAUDE_DESKTOP_APP_ASAR probed Override the resolved app.asar path
CLAUDE_TEST_USE_HOST_CONFIG unset When 1, opt out of per-test isolation and use the host's real ~/.config/Claude. Required for tests that need a signed-in claude.ai (S31, future submit-side QE runners). Side effect: these tests write to your real account — chats / settings persist
CLAUDE_HARNESS_USE_WAYLAND unset When 1, every runner spawns Electron with the native-Wayland backend (--ozone-platform=wayland + sibling flags from launcher-common.sh) instead of the default X11-via-XWayland. CLAUDE_USE_WAYLAND=1 is also exported into the spawn env for in-app paths that read it. Per-launch overrides via launchClaude({ extraEnv }) still win
YDOTOOL_SOCKET /tmp/.ydotool_socket Path to the ydotoold socket. Override only if the daemon binds elsewhere
OUTPUT_DIR ./results Where bundles land
RESULTS_DIR per-run derived Single-run output dir (set by sweep.sh; usually you don't set this manually)

Per-test isolation default

launchClaude() creates a fresh XDG_CONFIG_HOME / CLAUDE_CONFIG_DIR under $TMPDIR/claude-test-* for every launch and removes it on close(). This is the default to prevent state leaks between tests (SingletonLock collisions, persisted Quick Entry positions, etc. — see Decision 1 in docs/testing/automation.md). Three escape hatches:

  • launchClaude() — default, fresh per-launch isolation.
  • launchClaude({ isolation }) — pass a shared Isolation handle to launch the same app twice with persistent state (e.g. S35 position-memory across restart).
  • launchClaude({ isolation: null }) — opt out entirely; share the host's ~/.config/Claude. Used by tests gated on CLAUDE_TEST_USE_HOST_CONFIG for signed-in claude.ai access.

Layout

tools/test-harness/
├── package.json
├── tsconfig.json
├── playwright.config.ts
├── src/
│   ├── lib/                       # shared helpers
│   │   ├── electron.ts            # spawn + isolation + inspector attach
│   │   ├── inspector.ts           # Node-inspector RPC client (SIGUSR1 path)
│   │   ├── dbus.ts                # dbus-next session-bus + helpers
│   │   ├── sni.ts                 # StatusNotifierWatcher / Item
│   │   ├── wm.ts                  # xprop wrappers (X11 + XWayland)
│   │   ├── env.ts                 # XDG_CURRENT_DESKTOP / SESSION_TYPE branching
│   │   ├── row.ts                 # skipUnlessRow / skipOnRow primitives
│   │   ├── isolation.ts           # per-test XDG_CONFIG_HOME sandbox
│   │   ├── argv.ts                # /proc/$pid/cmdline reader + flag check
│   │   ├── asar.ts                # in-place app.asar reads (no temp extract)
│   │   ├── quickentry.ts          # Quick Entry domain wrapper (popup, MainWindow, ydotool)
│   │   ├── claudeai.ts            # claude.ai renderer UI domain (CodeTab, dialog mock, atoms)
│   │   ├── electron-mocks.ts      # mock-then-call helpers (dialog/showItemInFolder/openExternal)
│   │   ├── input.ts               # focus-shifter primitive (X11 only — xdotool + xprop verify; spawnMarkerWindow xterm)
│   │   ├── input-niri.ts          # focus-shifter primitive (Niri only — niri msg --json verify; spawnMarkerWindow foot)
│   │   ├── eipc.ts                # eipc-channel registry walker (per-webContents IPC scope; suffix-matched, UUID-opaque)
│   │   ├── retry.ts               # poll-until-true with timeout
│   │   └── diagnostics.ts         # launcher log, --doctor, session env
│   └── runners/                   # one .spec.ts per test ID
│       ├── T01_app_launch.spec.ts
│       ├── T03_tray_icon_present.spec.ts
│       ├── T04_window_decorations.spec.ts
│       ├── T17_folder_picker.spec.ts
│       ├── S09_quick_window_patch_only_kde.spec.ts
│       ├── S12_global_shortcuts_portal_flag.spec.ts
│       ├── S29_quick_entry_lazy_create_closed_to_tray.spec.ts
│       ├── S30_quick_entry_noop_after_app_exit.spec.ts
│       ├── S31_quick_entry_submit_reaches_new_chat.spec.ts
│       ├── S32_quick_entry_submit_gnome_stale_isfocused.spec.ts
│       ├── S33_electron_version_capture.spec.ts
│       ├── S34_shortcut_focuses_fullscreen_main.spec.ts
│       ├── S35_quick_entry_position_persisted_across_restarts.spec.ts
│       ├── S36_quick_entry_fallback_to_primary_display.spec.ts
│       ├── S37_quick_entry_popup_after_main_destroy.spec.ts
│       ├── H01_cdp_gate_canary.spec.ts
│       ├── H02_frame_fix_wrapper_present.spec.ts
│       ├── H03_patch_fingerprints.spec.ts
│       └── H04_cowork_daemon_lifecycle.spec.ts
├── probe.ts                       # one-off renderer-DOM probe (debugger on :9229)
├── grounding-probe.ts             # case-grounding runtime capture (see "Grounding probe" below)
└── orchestrator/
    └── sweep.sh                   # row-aware harness invocation

H-prefix specs are harness self-tests — they validate the harness's preconditions and the build pipeline's invariants (CDP gate alive, patches landed, daemon lifecycle clean). Cheap, run in <1s each except H04 which launches the app.

How L1 testing works (the SIGUSR1 path)

The shipped Electron has a CDP auth gate that exits the app whenever --remote-debugging-port or --remote-debugging-pipe is on argv and a valid CLAUDE_CDP_AUTH token isn't in env. Both Playwright's _electron.launch() and chromium.connectOverCDP() inject the gated flag, so both are blocked.

The gate doesn't check --inspect or runtime SIGUSR1, which is the same code path as the in-app Developer → Enable Main Process Debugger menu item. So:

  1. launchClaude() spawns Electron with no debug-port flags (gate asleep) and waits for the X11 window.
  2. app.attachInspector() sends SIGUSR1 to the pid; Node's inspector opens on port 9229.
  3. lib/inspector.ts connects via WebSocket and exposes evalInMain(body) and evalInRenderer(urlFilter, js) for tests.

From the inspector you can:

  • Drive the renderer via webContents.executeJavaScript()
  • Install main-process mocks (e.g. dialog.showOpenDialog for T17)
  • Inspect any Electron API state

Two gotchas worth knowing:

  • BrowserWindow.getAllWindows() returns 0 because frame-fix-wrapper substitutes the BrowserWindow class. Use webContents.getAllWebContents() instead — works correctly and includes both the shell window and the embedded claude.ai BrowserView.
  • Runtime.evaluate with awaitPromise: true returns empty objects for awaited Promise resolutions. inspector.evalInMain<T>() returns JSON.stringify(value) from the IIFE and parses on the caller side to dodge this.

Full writeup with rationale and tradeoffs: docs/testing/automation.md "The CDP auth gate".

Grounding probe

grounding-probe.ts is a separate entry-point — not a Playwright spec — that connects to a live Claude Desktop and dumps the runtime state backing the load-bearing claims in docs/testing/cases/. It exists because static grep against the 546k-line beautified bundle has known blind spots (lazy import()s, dynamic handler tables, conditional wiring), and some claims (S26 autoUpdater gate, S20 powerSaveBlocker path) can only be verified at runtime.

# Self-contained: launchClaude() + capture + tear down
npm run grounding-probe -- --launch

# Plus the one synthetic probe (powerSaveBlocker start+stop)
npm run grounding-probe -- --launch --include-synthetic

# Attach to an already-running app (manual --inspect=9229 setup)
npm run grounding-probe -- --port 9229 --out /tmp/probe.json

Output is keyed by test ID — see the file's header comment for the full table. Diff captures across upstream version bumps to spot behavior drift the static sweep would miss. Surfaces inside modals or popups (T22 PR toolbar, T26 preset list, T31 side chat, T32 slash menu) need the surface open at probe time — the AX-tree fingerprint is a snapshot of what's currently on screen.

Known limitations

  • T04 uses xprop (no xdotool dependency — walks _NET_CLIENT_LIST + _NET_WM_PID). Works on X11 native and KDE Wayland (XWayland), not on native-Wayland sessions where the app is running through Ozone-Wayland directly. Per Decision 6, project default is X11; native-Wayland window-state queries are deferred until those tests get added.
  • T17 is shallow — it intercepts dialog.showOpenDialog at the Electron main process level. The integration question "does Claude make the right portal call?" is a v2 concern; portal-level mocking via dbus-next is sketched in docs/testing/automation.md but requires displacing the running portal service or running under dbus-run-session.
  • render-matrix.sh isn't here yet. sweep.sh prints a summary; the matrix.md regen step from JUnit is the next addition.
  • No CI wrapper. Decision 4: the harness is invocable from CI but sweeps run from the dev box for the first ~20 tests.

Adding a test

  1. Pick the T## / S## from docs/testing/cases/.
  2. Drop src/runners/T##_short_name.spec.ts. Use the existing five as templates — match the layer (L1 / L2) to the test's assertion shape.
  3. First line of the test body: skipUnlessRow(testInfo, ['KDE-W', ...]). JUnit <skipped> → matrix -, never for a row that doesn't apply.
  4. Tag the test with severity and surface annotations so the JUnit output carries them.
  5. Capture diagnostics via testInfo.attach() — these become Decision 7 "always-on" captures regardless of pass/fail. For tests that need richer state on failure, wrap your scenarios in a results-collector and attach a single JSON dump (S31's pattern).
  6. No fixed sleeps. Use retryUntil or Playwright's auto-wait.

Hooking Electron — read this before reaching for BrowserWindow

scripts/frame-fix-wrapper.js returns the electron module wrapped in a Proxy whose get trap returns a closure-captured PatchedBrowserWindow. Constructor-level wraps don't work — your electron.BrowserWindow = WrappedCtor write lands on the underlying module but the Proxy keeps returning PatchedBrowserWindow on read, so the wrap is bypassed. The reliable hook is at the prototype-method level:

// in inspector.evalInMain(...)
const proto = electron.BrowserWindow.prototype;
const orig = proto.loadFile;
proto.loadFile = function(filePath, ...rest) {
  // record `this` + filePath; identify popups by filePath suffix
  return orig.call(this, filePath, ...rest);
};

This captures every instance regardless of subclass identity. Construction-time options (transparent: true, frame: false, etc.) aren't observable through this hook — use runtime equivalents instead (getBackgroundColor(), getContentBounds() vs getBounds(), isAlwaysOnTop()). lib/quickentry.ts is the worked example.