mirror of https://github.com/aaddrick/claude-desktop-debian.git synced 2026-05-17 08:36:35 +03:00

Files

Aaddrick 3506c14918 test(harness): add Linux compatibility test harness (#579 )

Build out a Playwright-based regression-detection harness covering
the compat-matrix surfaces (KDE-W, KDE-X, GNOME, Sway, i3, Niri,
packaging formats). Adds:

- Planning + decision docs under docs/testing/ — README, matrix,
  runbook, automation, cases/ (11 case files), quick-entry-closeout
- Playwright scaffolding (config, tsconfig)
- 78 spec runners under tools/test-harness/src/runners/ — T## case-
  doc runners and S## distribution/smoke runners
- Substrate primitives in tools/test-harness/src/lib/: AX-tree
  loader (snapshotAx + waitForAxNode + axTreeToSnapshot), focus-
  shifter, eipc-registry, niri-native bridge, drag-drop bridge,
  electron-mocks, claudeai page-objects, inspector client

S03 (DEB Depends declared) and S04 (RPM Requires declared) ship
marked test.fail() — they're regression detectors for the case-doc
gap (deb.sh emits no Depends:, rpm.sh sets AutoReqProv: no), and
the expected-failure shape lets them report green on every host
until upstream packaging starts declaring runtime deps.

127 files, no runtime changes; harness is opt-in via
'cd tools/test-harness && npx playwright test'.

Co-authored-by: Claude <claude@anthropic.com>

2026-05-04 23:17:37 -04:00

6.6 KiB

Raw Permalink Blame History

Linux Compatibility Testing

Last updated: 2026-05-03

This directory holds the manual test plan for the Linux fork of Claude Desktop. The structure is designed for human readers today and scripted runners tomorrow.

Layout

Folder / file	Purpose
`matrix.md`	The dashboard. Cross-environment results table + per-section env-specific status snapshots. Single source of truth for test status.
`runbook.md`	How to run a sweep: VM setup, diagnostic capture, status update workflow, severity guidance.
`cases/`	Functional test specs grouped by feature surface. Stable IDs: `T###` cross-env, `S###` env-specific.

Environment key

Abbrev	Distro	DE	Display server
KDE-W	Fedora 43	KDE Plasma	Wayland
KDE-X	Fedora 43	KDE Plasma	X11
GNOME	Fedora 43	GNOME	Wayland
Ubu	Ubuntu 24.04	GNOME	Wayland
Sway	Fedora 43	Sway	Wayland (wlroots)
i3	Fedora 43	i3	X11
Niri	Fedora 43	Niri	Wayland (wlroots)
Hypr-O	OmarchyOS	Hyprland	Wayland (wlroots)
Hypr-N	NixOS	Hyprland	Wayland (wlroots)

Status legend: ✓ pass · ✗ fail · 🔧 mitigated · ? untested · - N/A

Cells include linked issue/PR numbers when relevant — e.g. ✗ #404 or 🔧 #406. A bare ✗ means the failure is verified but no tracking issue is filed yet.

Severity tiers

Each test is tagged with one of:

Tier	Meaning	Sweep cadence
Smoke	Release-gate. Must pass before any tag is cut.	Every release tag, on KDE-W + one wlroots row
Critical	Regression-blocker. Failure on any supported environment blocks the release.	Every release tag, on every active row
Should	Important but not blocking. Track as bugs, fix before next stable.	Quarterly + on demand
Could	Edge cases, nice-to-have.	On demand only

Smoke set

The minimum set that gates a release. Run on KDE-W (daily-driver) plus Hypr-N (clean wlroots). Sweep target: ~20 minutes.

ID	Surface	One-line check
T01	Launch	App opens; main window renders within ~10s
T03	Tray	Tray icon appears; click toggles window
T04	Window	OS-native frame draws and responds
T05	Input	`xdg-open https://claude.ai/...` opens in-app
T07	Window	Hybrid topbar renders, every button clicks
T08	Window	Close button hides to tray, doesn't quit
T11	Extensibility	Anthropic & Partners plugin install completes
T15	Auth	Sign-in completes via `xdg-open` browser handoff
T16	Code tab	Code tab loads (no 403, no blank screen)
T17	Code tab	Folder picker opens via portal/native chooser

Test corpus snapshot

Bucket	Count
Cross-environment functional (`T###`)	39
Environment-specific functional (`S###`)	37
UI surfaces inventoried	10
Total functional tests	76

For detailed status by ID, see matrix.md.

Automation status

Automation is partially landed. The harness lives at tools/test-harness/ — twenty Playwright specs wired (T01, T03, T04, T17, S09, S12, S29-S37, plus four H-prefix self-tests), thirteen passing on KDE-W and six skipping cleanly per spec intent. See tools/test-harness/README.md for the live status table, automation.md for architectural decisions, and the SIGUSR1 / runtime-attach pattern that bypasses the app's CDP auth gate.

Grounding sweep + probe

Separate from the test sweep: runbook.md "Grounding sweep" covers the workflow for verifying case docs themselves against the live build on every upstream version bump — static anchor pass plus a runtime probe (tools/test-harness/grounding-probe.ts) that captures IPC handler registry, accelerator state, autoUpdater gate, AX-tree fingerprint, and other claims static analysis can't disambiguate. Anchor and drift conventions live in cases/README.md.

The structure remains automation-friendly for new tests:

Stable test IDs. T01-T39 and S01-S28 won't move. New tests append. Sequential, not semantic.
Standardized test bodies. Every functional test has Severity, Steps, Expected, Diagnostics on failure, and References sections. The Steps and Diagnostics fields are scripted-runner-shaped.
Per-element UI checklists. Each UI surface file lists interactive elements in a table — every row is a candidate webContents.executeJavaScript / xprop / DBus assertion.
Severity-driven sweeps. Tests with a runner: field execute via tools/test-harness/orchestrator/sweep.sh; JUnit XML lands in results/results-${ROW}-${DATE}/junit.xml. Tests without a runner: continue to run manually.

For tests that don't have a runner yet, status updates land in matrix.md by hand after each manual sweep. For tests that do, the automation invocation is the source of truth — see runbook.md.

Conventions

One PR per sweep result, not per cell change. Bundle a full row update into a single commit titled test: KDE-W sweep $(date +%F). Reduces matrix-merge noise.
Tested-version pin. Every status update should mention the claude-desktop upstream version + the project version (v1.3.x+claude...) in the commit. Otherwise a ✓ from six months ago looks current.
Diagnostics on failure are mandatory. Don't file ✗ without the captures listed in the test's Diagnostics on failure block. The runbook covers how to capture each.
Issue links go inline. Status cells link directly to the relevant issue/PR.

See runbook.md for the full mechanics.

6.6 KiB Raw Permalink Blame History