Files
Aaddrick 3506c14918 test(harness): add Linux compatibility test harness (#579)
Build out a Playwright-based regression-detection harness covering
the compat-matrix surfaces (KDE-W, KDE-X, GNOME, Sway, i3, Niri,
packaging formats). Adds:

- Planning + decision docs under docs/testing/ — README, matrix,
  runbook, automation, cases/ (11 case files), quick-entry-closeout
- Playwright scaffolding (config, tsconfig)
- 78 spec runners under tools/test-harness/src/runners/ — T## case-
  doc runners and S## distribution/smoke runners
- Substrate primitives in tools/test-harness/src/lib/: AX-tree
  loader (snapshotAx + waitForAxNode + axTreeToSnapshot), focus-
  shifter, eipc-registry, niri-native bridge, drag-drop bridge,
  electron-mocks, claudeai page-objects, inspector client

S03 (DEB Depends declared) and S04 (RPM Requires declared) ship
marked test.fail() — they're regression detectors for the case-doc
gap (deb.sh emits no Depends:, rpm.sh sets AutoReqProv: no), and
the expected-failure shape lets them report green on every host
until upstream packaging starts declaring runtime deps.

127 files, no runtime changes; harness is opt-in via
'cd tools/test-harness && npx playwright test'.

Co-authored-by: Claude <claude@anthropic.com>
2026-05-04 23:17:37 -04:00

6.6 KiB

Linux Compatibility Testing

Last updated: 2026-05-03

This directory holds the manual test plan for the Linux fork of Claude Desktop. The structure is designed for human readers today and scripted runners tomorrow.

Layout

Folder / file Purpose
matrix.md The dashboard. Cross-environment results table + per-section env-specific status snapshots. Single source of truth for test status.
runbook.md How to run a sweep: VM setup, diagnostic capture, status update workflow, severity guidance.
cases/ Functional test specs grouped by feature surface. Stable IDs: T### cross-env, S### env-specific.

Environment key

Abbrev Distro DE Display server
KDE-W Fedora 43 KDE Plasma Wayland
KDE-X Fedora 43 KDE Plasma X11
GNOME Fedora 43 GNOME Wayland
Ubu Ubuntu 24.04 GNOME Wayland
Sway Fedora 43 Sway Wayland (wlroots)
i3 Fedora 43 i3 X11
Niri Fedora 43 Niri Wayland (wlroots)
Hypr-O OmarchyOS Hyprland Wayland (wlroots)
Hypr-N NixOS Hyprland Wayland (wlroots)

Status legend: pass · fail · 🔧 mitigated · ? untested · - N/A

Cells include linked issue/PR numbers when relevant — e.g. ✗ #404 or 🔧 #406. A bare means the failure is verified but no tracking issue is filed yet.

Severity tiers

Each test is tagged with one of:

Tier Meaning Sweep cadence
Smoke Release-gate. Must pass before any tag is cut. Every release tag, on KDE-W + one wlroots row
Critical Regression-blocker. Failure on any supported environment blocks the release. Every release tag, on every active row
Should Important but not blocking. Track as bugs, fix before next stable. Quarterly + on demand
Could Edge cases, nice-to-have. On demand only

Smoke set

The minimum set that gates a release. Run on KDE-W (daily-driver) plus Hypr-N (clean wlroots). Sweep target: ~20 minutes.

ID Surface One-line check
T01 Launch App opens; main window renders within ~10s
T03 Tray Tray icon appears; click toggles window
T04 Window OS-native frame draws and responds
T05 Input xdg-open https://claude.ai/... opens in-app
T07 Window Hybrid topbar renders, every button clicks
T08 Window Close button hides to tray, doesn't quit
T11 Extensibility Anthropic & Partners plugin install completes
T15 Auth Sign-in completes via xdg-open browser handoff
T16 Code tab Code tab loads (no 403, no blank screen)
T17 Code tab Folder picker opens via portal/native chooser

Test corpus snapshot

Bucket Count
Cross-environment functional (T###) 39
Environment-specific functional (S###) 37
UI surfaces inventoried 10
Total functional tests 76

For detailed status by ID, see matrix.md.

Automation status

Automation is partially landed. The harness lives at tools/test-harness/ — twenty Playwright specs wired (T01, T03, T04, T17, S09, S12, S29-S37, plus four H-prefix self-tests), thirteen passing on KDE-W and six skipping cleanly per spec intent. See tools/test-harness/README.md for the live status table, automation.md for architectural decisions, and the SIGUSR1 / runtime-attach pattern that bypasses the app's CDP auth gate.

Grounding sweep + probe

Separate from the test sweep: runbook.md "Grounding sweep" covers the workflow for verifying case docs themselves against the live build on every upstream version bump — static anchor pass plus a runtime probe (tools/test-harness/grounding-probe.ts) that captures IPC handler registry, accelerator state, autoUpdater gate, AX-tree fingerprint, and other claims static analysis can't disambiguate. Anchor and drift conventions live in cases/README.md.

The structure remains automation-friendly for new tests:

  1. Stable test IDs. T01-T39 and S01-S28 won't move. New tests append. Sequential, not semantic.
  2. Standardized test bodies. Every functional test has Severity, Steps, Expected, Diagnostics on failure, and References sections. The Steps and Diagnostics fields are scripted-runner-shaped.
  3. Per-element UI checklists. Each UI surface file lists interactive elements in a table — every row is a candidate webContents.executeJavaScript / xprop / DBus assertion.
  4. Severity-driven sweeps. Tests with a runner: field execute via tools/test-harness/orchestrator/sweep.sh; JUnit XML lands in results/results-${ROW}-${DATE}/junit.xml. Tests without a runner: continue to run manually.

For tests that don't have a runner yet, status updates land in matrix.md by hand after each manual sweep. For tests that do, the automation invocation is the source of truth — see runbook.md.

Conventions

  • One PR per sweep result, not per cell change. Bundle a full row update into a single commit titled test: KDE-W sweep $(date +%F). Reduces matrix-merge noise.
  • Tested-version pin. Every status update should mention the claude-desktop upstream version + the project version (v1.3.x+claude...) in the commit. Otherwise a from six months ago looks current.
  • Diagnostics on failure are mandatory. Don't file without the captures listed in the test's Diagnostics on failure block. The runbook covers how to capture each.
  • Issue links go inline. Status cells link directly to the relevant issue/PR.

See runbook.md for the full mechanics.