claude-desktop-debian

mirror of https://github.com/aaddrick/claude-desktop-debian.git synced 2026-05-17 00:26:21 +03:00

Author	SHA1	Message	Date
Sum Abiut	b676519c58	test: add headless launch + --doctor smoke tests for AppImage artifact (#592 ) The AppImage artifact test only validated package structure (extraction, AppDir layout, asar contents) — runtime regressions like frame-fix-wrapper syntax errors, bad asar patches, or Electron startup crashes silently passed CI. The .deb path already ran `--doctor` as a smoke check; the AppImage path now has parity plus a 10s headless launch under Xvfb. `setsid` + `kill -- -PGID` is load-bearing: xvfb-run's EXIT trap leaks Xvfb on signal kill, so running the whole stack in its own process group lets the teardown reap xvfb-run, Xvfb, dbus, AppRun, electron, and zygote children together. `procps` (for pkill), `dbus-x11`, and `xvfb` added to the CI apt line. The headless probe catches main-process startup failures only — GPU / renderer-process crashes like #583 leave the main process alive and pass this check; that scope disclaimer is inlined at test-artifact-appimage.sh lines 114-117 so future contributors don't try to claim #583 coverage by switching Xvfb off. Co-authored-by: Sum Abiut <sabiut@users.noreply.github.com>	2026-05-16 10:15:39 -04:00
Sum Abiut	4b2b1d3390	ci: add concurrency group to test-flags workflow (#606 ) Prevents manual workflow_dispatch invocations from stacking on the same ref. Uses cancel-in-progress: false to match ci.yml so a reusable workflow_call invocation inside an in-flight CI run isn't killed when a new push lands. Co-authored-by: Sum Abiut <sum.abiut@titanfx.com>	2026-05-16 20:31:02 +11:00
Aaddrick	9df8b88e3a	verify(cowork): static-grep shipped asar for PR #555 markers (#559 ) (#575 ) * verify(cowork): static-grep shipped asar for PR #555 markers D6 of #559's followup plan: post-build check that greps the shipped app.asar for 9 known cowork patch markers and exits non-zero if any are missing. Catches the half-patched-asar failure mode from PR #555, where two of three failed gates had no else branch and the build log showed "Applied 10 cowork patches" instead of warning. - scripts/cowork-patch-markers.tsv: single source of truth. Tab-separated name<TAB>pcre<TAB>sample. Both verify and BATS read it. - scripts/verify-cowork-patches.sh: accepts a .js, an .asar (npx @electron/asar extract), or a directory containing app.asar.contents/.vite/build/index.js. Exits 0/1/2. - tests/verify-cowork-patches.bats: regex-matches-sample integrity, positive full fixture, per-marker negative fixtures, input-shape coverage. 9 new BATS cases. - .github/workflows/build-amd64.yml: runs verify against the deb build's asar. Pinned to deb because the patched JS is identical across formats. Validated end-to-end against the pinned 1.5354.0 installer: unpatched -> 9/9 miss; cowork.sh patched -> all 9 present. Refs #559. Co-Authored-By: Claude <claude@anthropic.com> * verify(cowork): share TSV parser between verify.sh and BATS Realises the library-mode plumbing the previous commit added but didn't use: BATS now sources verify-cowork-patches.sh and calls load_markers, so a TSV format change cannot desync the two consumers. Drops the duplicate parser in tests/verify-cowork-patches.bats. Also tightens main()'s loop (for over indexed while, drop redundant missing counter) and the BATS index loops. Behaviour-preserving; bats tests/verify-cowork-patches.bats still 9/9. Co-Authored-By: Claude <claude@anthropic.com> * rename: verify-cowork-patches → verify-patches (generic) Rename the verify infra to make its generic intent explicit. Per sabiut's review note on #575, the script + TSV are reusable for non-cowork patch sets in principle — drop "cowork" from the script and BATS filenames to reflect that, and accept an optional second arg for the marker TSV path so other patch sets can plug their own TSV in without forking the script. The TSV itself stays cowork-specific (`cowork-patch-markers.tsv`) because its contents are cowork markers; the script defaults to it so existing CI keeps working without changes beyond the rename. Routing implication noted by sabiut: filename now lives under `/tests/` → @sabiut codeowner mapping (intentionally; the verify infra is generic). Cowork-specific marker changes still touch the TSV under `/scripts/`, which routes to @aaddrick/@RayCharlizard via the cowork-* CODEOWNERS rule. Co-Authored-By: Claude <claude@anthropic.com> --------- Co-authored-by: Claude <claude@anthropic.com>	2026-05-05 07:25:22 -04:00
Aaddrick	14d04c2dab	fix(dnf): set metadata_expire=1h on generated .repo (#551 ) DNF defaults to a 48h metadata cache when metadata_expire is unset, so users running `dnf install/reinstall claude-desktop` shortly after a release see stale versions until either the cache expires or they manually run `dnf clean expire-cache`. Lower the cache TTL on the generated repo file so freshly published releases propagate within an hour without user intervention. Co-authored-by: Claude <claude@anthropic.com>	2026-05-03 12:37:06 -04:00
Niklas	912c04ee1d	fix(ci): force primary GPG key for repomd.xml signing (#566 ) * fix(ci): force primary GPG key for repomd.xml signing PR #217 added --default-key for the gpg invocation that signs repomd.xml, but gpg's --default-key only chooses an identity, not which key under that identity actually signs. Without a trailing '!' on the keyid, gpg silently picks the most recent signing subkey. rpm 4.20+ and zypper verify repomd.xml only against the primary key, so the published signature fails verification with "Signature verification failed for repomd.xml" / "Signing key not found" — the exact symptom reported in #213. Append '!' to the keyid argument to force the primary key. Verified locally against zypper 1.14.96 / rpm 4.20.1 / gpg 2.x by re-signing the live repomd.xml with a test primary+subkey keypair: - Without '!': sig keyid = subkey, zypper refresh fails with "Signature verification failed for repomd.xml" (reproduces the production bug 1:1). - With '!': sig keyid = primary, zypper refresh succeeds: "Die angegebenen Repositorys wurden aktualisiert." Fixes #213 (regression of PR #217) Co-Authored-By: Claude <claude@anthropic.com> * docs(ci): tighten repomd.xml signing comment Compress the rationale block from 8 to 6 lines while preserving the load-bearing facts (gpg picks subkey by default, rpm 4.20+ / zypper reject subkey-signed repomd.xml, '!' forces the primary key, #213/#217 regression history). Adds an explicit "Do not strip it" admonition to the future reader. No functional change. Co-Authored-By: Claude <claude@anthropic.com> --------- Co-authored-by: Claude <claude@anthropic.com>	2026-05-03 07:43:30 -04:00
Aaddrick	4cc63bff7a	ci: pin third-party actions to commit SHAs (#535 ) Replaces mutable tag refs (e.g. @v4) with full commit SHAs across all workflows, with the version retained as a trailing comment for readability and dependabot compatibility. Motivation: the March 2026 trivy-action supply-chain attack poisoned 75 of 76 version tags in a single repo. Any consumer using @vX-style references ran the compromised code automatically. SHA pinning makes that class of attack a no-op for us — a hijacked tag cannot point at new code without the SHA also changing. Pinned actions: actions/checkout@v4, actions/upload-artifact@v4, actions/download-artifact@v4, actions/setup-python@v5, actions/setup-node@v4, actions/github-script@v7, softprops/action-gh-release@v2, crazy-max/ghaction-import-gpg@v6, codespell-project/codespell-problem-matcher@v1, codespell-project/actions-codespell@v2, cloudflare/wrangler-action@v3, DeterminateSystems/nix-installer-action@v21 Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Claude <claude@anthropic.com>	2026-04-28 07:25:28 -04:00
Aaddrick	ea9b8aa0ab	ci: remove Quad9 DNS monitor (#528 ) Quad9 now resolves pkg.claude-desktop-debian.dev to Cloudflare IPs; the hourly check is no longer needed. Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Claude <claude@anthropic.com>	2026-04-27 07:21:43 -04:00
Sum Abiut	0217a2c0e1	ci: run BATS test suite on push and PR (#520 ) * ci: run BATS test suite on push and PR The /tests/ directory has 186 BATS tests (launcher-common, launcher-xrdp-detection, and four cowork-.bats files) but no workflow ever invoked `bats` — the entire suite was effectively inert. A regression in launcher-common.sh or cowork-vm-service.js would not fail any check, including the BATS suite added by PR #395. Add a standalone tests.yml workflow that: - installs bats + nodejs - runs `bats tests/.bats` - executes on every PR - executes on pushes to main Push triggers are path-filtered to: - tests/ - scripts/ - .github/workflows/tests.yml PR triggers remain unfiltered so required-check behaviour stays predictable. Kept this standalone rather than extending test-artifacts.yml so unit tests run in seconds instead of waiting for full artifact builds. This can be promoted to a build gate later once it proves stable in CI. CODEOWNERS - adds /.github/workflows/tests.yml under @sabiut - keeps /tests/cowork-.bats ownership with @RayCharlizard This PR only enables CI coverage for existing tests and does not modify cowork test logic. fix(tests): unset XDG_CONFIG_HOME in cowork-bwrap-config setup The "doctor: reports custom bwrap mounts" and "doctor: warns about disabled critical mount /usr" tests failed in CI but passed locally. Root cause: - _doctor_check_bwrap_mounts in scripts/doctor.sh resolves the config dir via ${XDG_CONFIG_HOME:-$HOME/.config}/Claude - The test setup() only sandboxes HOME via TEST_TMP - GitHub Actions runners export XDG_CONFIG_HOME ambient - Function reads the runner's real config dir, not the test fixture, and silently emits no output - Assertions on /opt/tools, WARN, etc. fail Surfaced by PR #520 wiring BATS into CI for the first time; the bug existed before but was hidden by the suite never running. Fix: unset XDG_CONFIG_HOME in setup() so the function falls back to \$HOME/.config (which is sandboxed). Comment in the file documents why HOME alone is insufficient. Verified: 186/186 pass with XDG_CONFIG_HOME set ambient (reproduces CI env).	2026-04-27 11:48:21 +11:00
Aaddrick	7f4cf49431	chore(monitoring): hourly Quad9 DNS check for pkg.claude-desktop-debian.dev (#525 ) * chore(monitoring): hourly Quad9 DNS check for pkg.claude-desktop-debian.dev Adds a workflow that fires hourly via cron, runs `dig +short` against Quad9 (9.9.9.9), and appends a result line to the body of issue #524. On the first successful resolution, the workflow tags @aaddrick and self-disables via `gh workflow disable`. Includes workflow_dispatch so the check can be triggered on demand without waiting for the next cron tick. Token scope is the default GITHUB_TOKEN with issues:write + actions:write. Refs #521 #524 Co-Authored-By: Claude <claude@anthropic.com> * chore(dns-monitor): pass step output through env, not bash interpolation Routing `steps.dig.outputs.line` through `env:` matches the pattern used by `apt-repo-heartbeat.yml` and avoids interpolating arbitrary text directly into the shell command. Co-Authored-By: Claude <claude@anthropic.com> --------- Co-authored-by: Claude <claude@anthropic.com>	2026-04-25 10:02:28 -04:00
aaddrick	95b65dd333	chore(issue-template): hoist apt-update callout above the privacy notice Swaps the two markdown blocks so the apt scheme-downgrade signpost is the first thing a user sees when they open the bug template — the privacy notice still renders, just below it. Co-Authored-By: Claude <claude@anthropic.com>	2026-04-25 08:44:25 -04:00
Aaddrick	6dd667cd2b	chore(issue-template): funnel apt-update legacy-URL reports to migration docs (#522 ) Adds a contact_link on the issue chooser that surfaces the apt scheme-downgrade symptom verbatim and links the README migration section, plus a markdown callout at the top of bug_report.yml with the inline sed one-liner. Catches reports like #516 and #519 before they're filed as bugs. Co-authored-by: Claude <claude@anthropic.com>	2026-04-25 08:41:37 -04:00
Aaddrick	8bce730056	docs: point install instructions at pkg.claude-desktop-debian.dev (#510 ) Phase 4a-APT cutover (#493, #503) moves binary distribution behind a Cloudflare Worker at pkg.claude-desktop-debian.dev. The Worker serves repo metadata directly and 302-redirects .deb/.rpm requests to GitHub Release assets, which makes the >100 MB .deb push cap irrelevant. GitHub Pages auto-301s legacy aaddrick.github.io/claude-desktop-debian URLs to pkg.claude-desktop-debian.dev, but the redirect uses http:// (Pages has no cert for pkg.<domain> — DNS points at Cloudflare, so Pages can never pass domain verification). apt refuses that scheme downgrade as a security policy, so existing users' sources.list silently breaks on the next `apt update`. DNF accepts the downgrade and keeps working. Changes: - README.md: install snippets (APT + DNF) now point at pkg.claude-desktop-debian.dev directly. New users never touch the Pages redirect chain. - README.md: add a "Migrating from the old aaddrick.github.io URL" section with sed one-liners for existing users + a short background paragraph explaining why the change was needed. - .github/workflows/ci.yml: release-notes install snippets (APT + DNF, both branches) and the generated claude-desktop.repo file's baseurl and gpgkey all point at pkg.<domain>. Smoke-test chain walkers deliberately keep starting at github.io (they test the full 3-hop Pages→Worker→Releases chain for clients that do follow the downgrade, like curl-without-L and dnf). Refs #493, #503	2026-04-23 16:12:05 -04:00
Aaddrick	de19c1bb36	fix(ci): smoke test accepts release-assets CDN hostname (#509 ) v2.0.4 rerun of update-apt-repo made it past hops 0 and 1 (the smoke test scheme fix in #506 worked — Pages' http:// redirect no longer trips the chain walker), but failed on hop 2: Hop 2: 302 .../releases/download/v2.0.4+claude1.3883.0/...deb -> https://release-assets.githubusercontent.com/... ::error::Hop 2 mismatch: expected https://objects\.githubusercontent\.com/, got https://release-assets.githubusercontent.com/... GitHub migrated the Release asset CDN from objects.githubusercontent.com to release-assets.githubusercontent.com (both have been serving in the past; release-assets is the current canonical hostname). Accept either hostname via alternation. Verified against the actual v2.0.4 Release: $ curl -Is https://github.com/aaddrick/claude-desktop-debian/releases/download/v2.0.4+claude1.3883.0/claude-desktop_1.3883.0-2.0.4_amd64.deb \ \| grep -i location location: https://release-assets.githubusercontent.com/github-production-release-asset/... Same fix in three sites: - .github/workflows/ci.yml (update-apt-repo smoke test) - .github/workflows/ci.yml (update-dnf-repo smoke test) - .github/workflows/apt-repo-heartbeat.yml (daily heartbeat) docs/worker-apt-plan.md has historical references to objects.githubusercontent.com too; those can be updated in a follow-up docs sweep — the architectural claim (binary bytes flow direct from GitHub CDN, never through Cloudflare) is unchanged. Refs #493, #503	2026-04-23 11:06:31 -04:00
Aaddrick	eb90be32e9	fix(ci): smoke test accepts http:// on Pages 301 hop (#506 ) * fix(ci): smoke test allows http:// on Pages 301 hop Phase 4a-APT's first rerun of update-apt-repo succeeded all the way through strip + push (v2.0.3 metadata is live on gh-pages now), but the smoke test failed at hop 0: Hop 0: 301 https://aaddrick.github.io/.../.deb -> http://pkg.claude-desktop-debian.dev/.../.deb Hop 0 mismatch: expected https://pkg..., got http://pkg... Pages emits http:// in the Location header because https_enforced is unsettable on the repo's Pages config: DNS for pkg.<domain> points at Cloudflare (Worker custom_domain), so Pages can never pass domain verification to provision its own cert. Cloudflare serves both schemes for pkg.<domain>, so the http vs https in Pages' redirect is cosmetic — the chain still terminates correctly. Relax hop 0's regex in both smoke tests (update-apt-repo, update-dnf-repo) and the heartbeat workflow to accept https?://. Later hops stay https-only since GitHub's Release-asset redirects are always HTTPS. Failure was the tail-end of run 24836419696's rerun: https://github.com/aaddrick/claude-desktop-debian/actions/runs/24836419696 Refs #493, #503 * chore: retrigger CI (previous trigger lost to GH flake)	2026-04-23 10:43:53 -04:00
Aaddrick	09d5f4af68	fix(worker): use raw.githubusercontent.com as origin to avoid Pages 301 loop (#504 ) Once the CNAME file is in place on gh-pages (Phase 4a-APT), GitHub Pages auto-301s all aaddrick.github.io/claude-desktop-debian/* traffic to pkg.claude-desktop-debian.dev/*. The Worker's origin fetch against aaddrick.github.io gets 301'd by Pages, the 301 passes through to the client, the client follows it back to pkg.<domain>, and the Worker runs again — infinite loop. Observed immediately after merging #503 and Pages finishing the CNAME build: $ curl -I https://pkg.claude-desktop-debian.dev/dists/stable/InRelease HTTP/2 301 location: http://pkg.claude-desktop-debian.dev/dists/stable/InRelease x-github-request-id: 3C94:286425:... x-served-by: cache-yyz4566-YYZ via: 1.1 varnish (Scheme-downgrade to http is a separate Pages quirk when https_enforced=false, which is the case here because DNS points at Cloudflare, not Pages, so Pages can't provision a cert.) raw.githubusercontent.com serves the same gh-pages branch content without Pages' routing layer. All five metadata paths verified to return 200: /dists/stable/InRelease /dists/stable/main/binary-amd64/Packages /KEY.gpg /rpm/x86_64/repodata/repomd.xml /rpm/x86_64/repodata/repomd.xml.asc Also fixes the deploy-worker.yml post-deploy probe which still hardcoded pkg-staging. That's what made #503's deploy show as failed in the Actions UI even though the wrangler deploy itself succeeded — route bound and Worker live, but the probe was resolving a hostname wrangler had just removed. Refs #493, #503 Co-authored-by: Claude <claude@anthropic.com>	2026-04-23 10:22:45 -04:00
Aaddrick	0bcf7a473f	fix(ci): resolve DNF Worker chain blockers (#500 , #501 ) (#502 ) Fix #500: rpmsign --addsign mutates RPMs in place, so the Release asset uploaded by the release job (unsigned) diverged from the signed copy in gh-pages. The Worker redirects to the Release asset, so dnf saw a sha256 that didn't match repodata. Re-upload the signed RPMs to the Release via gh release upload --clobber after signing. Fix #501: The imported GPG keyring contains two keys; reprepro signs InRelease with one and rpmsign signs repomd.xml.asc with the other, but the published KEY.gpg only contained one of them. Strict clients like rockylinux:9 rejected repo metadata with "Bad GPG signature". Export the full keyring (all public keys) to KEY.gpg so both signatures verify. Validation (per issue reproduction steps): - Re-run update-dnf-repo on a test tag - sha256 of gh-pages RPM must match the Release asset download - fedora:latest dnf install should succeed (was "All mirrors tried") - rockylinux:9 dnf makecache should succeed (was "Bad GPG signature") Co-authored-by: Claude <claude@anthropic.com>	2026-04-23 08:52:41 -04:00
Aaddrick	4fb076ec12	feat: APT/DNF Worker scaffolding (#498 ) * feat: APT/DNF Worker scaffolding (#493) Adds the implementation scaffolding for the Cloudflare Worker that fronts the APT/DNF repo, per docs/worker-apt-plan.md. New files: - worker/src/worker.js: redirects /pool/.../.deb and /rpm//.rpm to GitHub Release assets via 302; passes metadata through to the gh-pages origin - worker/wrangler.toml: bound to pkg-staging.claude-desktop-debian.dev initially; Phase 4a switches to pkg.claude-desktop-debian.dev - .github/workflows/deploy-worker.yml: deploys Worker on worker/* push, post-deploy probe verifies route bound + Worker responding - .github/workflows/apt-repo-heartbeat.yml: daily cron, deb+rpm matrix, walks ordered redirect chain + size match against Releases asset, opens format-specific tracking issue on failure (auto-close on recovery), gates on Worker liveness (skips silently before Phase 4a) Modified: - .github/workflows/ci.yml: gated strip step + ordered-chain smoke test added to update-apt-repo and update-dnf-repo; the destructive strip only fires when the production Worker probe succeeds, so this PR can land before Phase 4a without affecting current behavior - docs/worker-apt-plan.md: bake in real domain values, mark Decisions table entries as concrete, fix Cloudflare API token permissions list (current names: Workers Scripts Edit, Account Settings Read, Workers Routes Edit; previous "Zone:Zone:Read" name no longer matches the dropdown) Pre-Phase-4a behavior: the strip step's liveness probe targets the production hostname which doesn't exist yet, so it always skips and .debs/.rpms are pushed to gh-pages exactly as today. Smoke tests skip on the same gate. Heartbeat workflow's gate skips before the Worker is live. Nothing destructive happens until Phase 4a explicitly cuts the Worker over to production. Co-Authored-By: Claude <claude@anthropic.com> * refactor: simplify worker scaffolding per cdd-code-simplifier review - worker.js: use named capture group `asset` instead of opaque `m[1]` positional reference; inline single-use `tagFor()` helper; demote unused `arch` capture to non-capturing group. - ci.yml: hoist `WORKER_DOMAIN` from per-step env to job-level env in both `update-apt-repo` and `update-dnf-repo` (matches the pattern already used in `apt-repo-heartbeat.yml`). - apt-repo-heartbeat.yml: use github-script's native `context.serverUrl` / `context.runId` instead of reconstructing from process.env; spread `...context.repo` instead of repeating owner/repo on every API call; destructure `{ data: open }` to flatten `open.data` references. All changes preserve behaviour. The contrarian-fix mechanisms (positive Worker liveness probe gating the strip step, hop-by-hop ordered chain walk in smoke tests) are unchanged. APT/DNF strip + smoke pairs remain in-place per reviewer-readability preference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 07:31:18 -04:00
Aaddrick	35d4735b2d	fix(triage): normalize claimed_version before drift compare (#483 ) Reporter on #481 pasted the deb package version `claude-desktop 1.3561.0-2.0.0`. The classifier extracted `1.3561.0-2.0.0` verbatim, and the naive `claimed != CLAUDE_DESKTOP_VERSION` string compare flagged drift against `1.3561.0`. The issue is on the current release — no drift should fire. Fix normalizes both sides: strip a leading `v`, then strip anything from the first `-` or space onward. Handles: - `1.3561.0-2.0.0` → `1.3561.0` (deb package: upstream-REPO_VERSION) - `v1.3561.0` → `1.3561.0` (copy-paste with prefix) - `1.3561.0 stable` → `1.3561.0` (whitespace-separated qualifier) - `1.3561.0` → `1.3561.0` (bare upstream, unchanged) Same normalization applied to CURRENT_VERSION for symmetry, even though the repo variable is always the bare upstream semver — keeps the compare resilient if that ever changes. Fixes the false drift banner on #481 and prevents the same shape from tripping on any future issue where a reporter pastes their `dpkg -l \| grep claude` output or AppImage filename. Co-authored-by: Claude <claude@anthropic.com>	2026-04-21 15:53:34 -04:00
Aaddrick	6fceb39d60	docs(triage): sync README with shipped pipeline; drop plan + research (#480 ) The README was drafted as a design spec before implementation. Now that the pipeline is live and the design has been validated end-to- end, bring the doc into agreement with the code and retire the two companion files. README updates: - Intro: state the production trigger (`issues: [opened]`) and the workflow_dispatch fallback; note v1 is manual-only - Stage 7 table: reorder by actual priority (drift is no longer a top-of-gate veto); drift section rewritten to describe the banner- and-candidates-modifier behavior landed in PR #476 - Stage 8a rendered-output example: show the conditional drift banner + drift-bridge candidates block that actually render - Stage 8b reason enum: add `reference-source unavailable` that was missing from the list - Rollout posture: describe the cutover as completed, not deferred - Implementation layout: drop "during rollout" qualifier; add helper-scripts row (validate.sh / drift-bridge.sh / suspicious-input-scan.sh / extract-json.py) - Artifacts list: full set with 14-day retention, not just the original four - Reasons.json SSOT pointer: actual path `.claude/scripts/reasons.json` instead of the aspirational `lib/templates/reasons.json` - Potential future improvements: drop "Cutover to issues:[opened]" subsection (done) - Clean up "v1" usage where it means "first version of the pipeline" (confusable with legacy v1 workflow) Deleted: - docs/issue-triage/implementation-plan.md — phased build sequence is complete; commit history preserves the record - docs/issue-triage/research-trail.md — design-pass sources are cited inline in the README where needed Workflow banner updated to drop the `implementation-plan.md` pointer. Co-authored-by: Claude <claude@anthropic.com>	2026-04-21 15:46:13 -04:00
Aaddrick	6adf2bf46d	chore(triage): v2 production cutover (#478 ) Three changes bundled because they land together as the cutover: 1. v2 `issues: [opened]` trigger enabled. Workflow now fires automatically on new issues in addition to the existing workflow_dispatch path. `run-name`, `concurrency.group`, and the gate step's ISSUE_NUMBER all resolve via `github.event.issue.number \|\| inputs.issue_number` so both trigger paths work. The existing `inputs.dry_run != true` gates on label/comment application — under an issues trigger that expression is empty ≠ true, so production posts/labels land. 2. v1 `issues` trigger removed. `issue-triage.yml` keeps `workflow_dispatch` for manual fallback (maintainer can still fire it if v2 is paused or rolled back), but no longer runs automatically. v1's `run-name`/concurrency dropped the now-dead `github.event.issue.number` fallback. 3. Investigate timeout 600s → 1200s. Bumped after two consecutive timeouts on #311 during Phase 4 + drift-as-banner verification. The investigator needs more tool-call budget on complex issues. Review step stays at 600s — it runs without tool access and has never timed out. Rollback: revert this commit to restore v1's automatic trigger; v2's `issues:` block goes back to workflow_dispatch-only in the same operation. Co-authored-by: Claude <claude@anthropic.com>	2026-04-21 08:55:06 -04:00
Aaddrick	f1eed0e16f	fix(triage): drift-as-banner — demote drift from gate to modifier (#476 ) Post-Phase 4 verification showed two issues (#311, #448) where the pipeline successfully produced valuable findings against current code, but the top-of-gate drift veto routed them to 8b drift-only and the findings were discarded. The reporter cited an older version (1.1.7464 on #311), the investigation ran cleanly on current (1.3.5610), and the reviewer approved the findings — yet the comment still read "couldn't reach a confident read." This change keeps drift detection and keeps the drift-bridge sweep. What changes is Stage 7: drift is no longer at the top of the gate. When drift is detected and 8a or 8c would render cleanly, the renderer prepends a drift banner (⚠ You reported this on X; bot investigated on Y. Citations may still apply.) and appends the drift-bridge-candidates block at the bottom. The finding citations stand — they describe current code in hypothesis voice, which is what the reader can verify against their own checkout. When drift is detected and the pipeline would otherwise route to 8b for any other reason (fetch-failure, invest-failure, review-failure, no-findings, low-confidence), the reason is overridden to `version-drift`. Drift-bridge candidates give the maintainer a more actionable signal than "no findings" on its own. Reviewer prompt gains one rubric addition: downgrade-confidence when the cited surface clearly post-dates the reporter's version. Catches the case where a finding is valid on current but wouldn't reproduce on what the reporter saw. Doesn't degrade findings indiscriminately — only when the reviewer can see version-specific evidence. Confirmed-duplicate routing wins over the drift-reason override (explicit exclusion in the override clause) because `triage: duplicate` is still the more specific read. Co-authored-by: Claude <claude@anthropic.com>	2026-04-21 08:33:49 -04:00
Aaddrick	28882ea475	feat(triage): Phase 4 sub-PRs 3+4 — regression_of + edit-during-triage (#472 ) * feat(triage): Phase 4 sub-PRs 3+4 — regression_of + edit-during-triage Bundles the two remaining Phase 4 sub-phases. Both are small workflow additions that build on infrastructure already in place: the Phase 1 input snapshot (updated_at captured at Stage 1) and the Phase 1 classify.json's regression_of field. regression_of end-to-end (Stage 3b + Stage 4 + Stage 6) - New step `Validate regression_of` between drift-check and fetch. Runs only when classify set regression_of to non-null. - Validation: PR exists in this repo; PR is merged; PR's mergedAt precedes issue's createdAt. Any failure clears to null with a logged note and the issue proceeds as a regular bug. - Valid regression → `gh pr diff` fetched (capped at 4000 lines) and inlined into the investigate prompt as primary context. Tells the investigator to start the search in the PR's changed files. - Same diff inlined into the review prompt, wrapped as pipeline_data, so the reviewer can check whether findings land inside the named PR's changed files. - Handles the spec's "cleared to null with logged note" requirement for upstream Electron PRs that aren't in this repo. Edit-during-triage detection (Stage 8 post-processor) - New step between 8a/8c post-processors and Apply labels. Runs for every variant. - Re-fetches issue.updated_at live and compares against the Stage 1 input_snapshot.updated_at. - On mismatch: appends a `⚠ This issue was edited after triage began. ...` disclaimer to the rendered comment, pointing at input_snapshot.json as the audit trail. - Catches inject-then-delete attacks (inject instructions, wait for bot, delete before a human reads) and honest mid-triage edits that would make the comment stale. Step summary gains `regression_of validated` row. With this PR, Phase 4 is complete: 8c enhancement-design, suspicious- input tells, regression_of, edit-during-triage detection are all live. All terminal paths (bug / enhancement / question / duplicate / needs-info / not-actionable / suspicious) flow through the pipeline end-to-end per spec. Co-Authored-By: Claude <claude@anthropic.com> * docs(triage): correct stale sort -u reference in date-compare comment The comment above the ISO 8601 date check referenced `sort -u`, which isn't used in the code. Rewrite to describe what the code actually does: `[[ > ]]` on the raw timestamp strings, which is valid because ISO 8601 sorts lexicographically as chronologically. Also re-orient the prose around the invalid case (mergedAt AFTER createdAt), matching the branch that the following `if` takes. Co-Authored-By: Claude <claude@anthropic.com> --------- Co-authored-by: Claude <claude@anthropic.com>	2026-04-20 23:41:48 -04:00
Aaddrick	9fc49bd260	feat(triage): Phase 4 sub-PR 2 — suspicious-input tells (#471 ) * feat(triage): Phase 4 sub-PR 2 — suspicious-input tells Adds a conservative Stage 2a tripwire that scans the raw issue body and title for prompt-injection tells before any LLM call. A match short-circuits routing to 8b with reason `suspicious-input — manual review`, no Sonnet invocation. The scan is the front-line filter; the actual injection mitigations (wrap-as-data, fresh-context reviewer, schema-constrained output) remain in place for everything that doesn't trip. The two layers are complementary: the scan catches the obvious attempts cheaply, the downstream defenses protect against the clever ones. Taxonomy - taxonomies/suspicious-input-tells.json — eight tells with regex patterns and rationale: - ignore-prior-instructions: classic opener - system-prompt-leak: exfiltration attempts - role-override: "you are now a different…" - forget-instructions: variation of ignore-prior - developer-mode: named jailbreaks (DAN, etc.) - instruction-injection-sysrole: chat-template tokens - long-base64-block: 200+ contiguous base64 chars - unicode-tag-sequence: U+E0000-E007F invisibles Scanner - scripts/triage/suspicious-input-scan.sh — pure bash, PCRE via grep -Pzi, writes suspicious-input.json with matched_tells[]. Uses the same taxonomy-as-data pattern as reasons.json and label-blocklist.json. Workflow - Stage 2a step runs between input snapshot and classify, outputs `suspicious` boolean - Classify + doublecheck both `if:`-gated so they skip on a hit - Decide route takes suspicious first, before the doublecheck disagreement check — a tripped tell defers deterministically - Step summary shows the suspicious flag Co-Authored-By: Claude <claude@anthropic.com> * refactor(triage): drop dead null-string guards in suspicious-input scan jq -r '.body // ""' already returns an empty string for JSON null or a missing field, so the subsequent `[[ "${body}" == "null" ]]` guards only fire when a reporter's body is the literal four-character string "null" — which isn't an injection signal and matches no tell. The comment describing the guards was also wrong about jq's behavior. Remove both guards and correct the comment. Also fix a misleading comment about `\|\| true` (which isn't in the code) and collapse the 4-line `suspicious` boolean derivation into a single `jq 'length > 0'`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 23:34:46 -04:00
Aaddrick	b9fe8e3c14	feat(triage): Phase 4 sub-PR 1 — Stage 8c enhancement-design variant (#470 ) * feat(triage): Phase 4 sub-PR 1 — Stage 8c enhancement-design variant Adds the third Stage 8 template variant. Previously, enhancement- classified issues fell through to 8b human-deferral; now they run through the investigate pipeline with enhancement-specific prompts and render a lightweight acknowledgment + existing-surface citations + design-review questions from a fixed taxonomy. Prompts and schemas - taxonomies/enhancement-design-questions.json — six fixed IDs: config-schema-stability, backward-compat, security-surface, test-coverage, observability, packaging-format. Each carries a concrete question the renderer surfaces verbatim. - schemas/comment-enhancement.json — structured output: 1-sentence acknowledgment_line, 0-3 existing_surfaces (each with file:line), 1-3 design_question_ids (enum-matched against the taxonomy). - prompts/comment-enhancement.txt — drafter prompt, hypothesis voice, rules of thumb for picking design questions. - prompts/investigate-enhancement.txt — investigate variant. Same schema, but claim_type=absence is banned (by definition the enhancement's capability is absent; restating is redundant and tips into design-prescription). Findings must cite existing code the enhancement would touch. - prompts/review-enhancement.txt — reviewer rubric reframed from "is this defect claim correct?" to "is this an existing surface the enhancement would actually touch?" Reject leans on real-but-irrelevant surfaces, since those actively misdirect. Workflow - Route decision: enhancement now enters the investigate path alongside bug and duplicate (route renamed `investigate`). Both the investigate step and the review step pick the enhancement- variant prompt when classification == enhancement. - Decision gate: new enhancement branch slotted between invest-failure and no-findings. 8c fires when review succeeded (any kept count, including 0) OR when findings_passed was 0 and the review step was skipped by design — the design questions carry the comment alone. - Stage 8c render: bash cross-joins design_question_ids against the taxonomy; a missing lookup errors loudly rather than silently dropping. - 8c post-processor: 350-word cap per spec; trims the last existing_surfaces bullet when over cap. - Apply labels: 8c variant → `triage: investigated` + `enhancement` class label. Deferred to later Phase 4 sub-PRs: suspicious-input tells, regression_of end-to-end diff fetch, edit-during-triage detection. Co-Authored-By: Claude <claude@anthropic.com> * refactor(triage): reuse classify step output instead of re-parsing classification.json Drops two redundant `jq -r '.classification' /tmp/triage/classification.json` calls in the investigate + review steps; both now read the value via a `CLASSIFICATION_NAME` env var sourced from `steps.classify.outputs.classification`. Matches the `Decide comment variant` step's existing pattern for reading classify state, so the three call sites converge on one idiom. No behavior change — the prompt-selection conditional reads the same value; just fewer forks of jq. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 23:26:41 -04:00
Aaddrick	7d083d9163	refactor(triage): rename `feature` classification to `enhancement` (#466 ) Aligns the v2 classifier vocabulary with the repo's GitHub label vocabulary. Previously `classification=feature` was mapped to label `enhancement` at Stage 9 — a redundant indirection that also caused miscalibration on defects framed as enhancement-shaped asks (e.g. #448 "breaks in-app schedulers and 'minimize to tray' expectation" classified as feature + ambiguous when the maintainer read is bug). Changes: - classify.json enum: feature → enhancement - classify-doublecheck-bugfeature.{json,txt} → classify-doublecheck-bug-vs-enhancement.{json,txt} - Doublecheck rubric tightened: added "breaks X" / "stopped working" as explicit bug signals and a rule that a broken expectation wins over enhancement-shaped framing when both are present. Reduces the chance of #448-shaped defects routing to the ambiguous bucket. - investigate.txt absence-claim ban: "feature X is missing" → "capability X is missing" - reasons.json: "ambiguous bug/feature classification" → "ambiguous bug/enhancement classification" - Workflow: doublecheck step renamed, classification checks updated, class_label map collapsed to direct (no more feature→enhancement remap). - docs/issue-triage/{README.md,implementation-plan.md}: vocabulary updated throughout (~47 occurrences). 8c variant renamed Feature-design → Enhancement-design. Planned Phase 4 file names (comment-enhancement.json, enhancement-design-questions.json) follow suit. Kept as-is: - `.github/ISSUE_TEMPLATE/feature_request.yml` filename — preserves the GitHub convention reporters recognize on the issue-chooser page; classifier buckets issues filed through it as `enhancement`. - v1 `issue-triage.yml` + `triage-classify.json` — untouched; v1 is slated for replacement and doesn't gain from this rename. No behavioral change at runtime beyond the rubric tightening; the rename collapses an indirection rather than adding logic. Co-authored-by: Claude <claude@anthropic.com>	2026-04-20 22:58:33 -04:00
Aaddrick	471c62dde0	chore(codeowners): add @sabiut for testing & release quality (#468 ) Gives @sabiut review ownership of /tests/, /scripts/doctor.sh, and the test-artifacts + test-flags workflows. Shared review with @aaddrick on /docs/TROUBLESHOOTING.md and /.github/workflows/shellcheck.yml. Cowork override at the bottom of the file still wins for /tests/cowork-*.bats per last-match-wins. Announcement: #467 Co-authored-by: Claude <claude@anthropic.com>	2026-04-20 22:57:28 -04:00
Aaddrick	d0544d44e8	feat(triage): Phase 3 — Stage 6 adversarial reviewer + duplicate gate (#465 ) * feat(triage): Phase 3 — Stage 6 adversarial reviewer + duplicate gate Adds a fresh-context reviewer between mechanical validation (Stage 5) and the decision gate (Stage 7). The reviewer steel-mans each surviving finding, commits to a counter-reading, runs closed-world checks on identifier claims, and emits approve / downgrade-confidence / reject with structured rationale. It also rates each cited related_issue and the duplicate_of target (exact / related / unrelated). Stage 7 now gates on reviewer verdicts. approve keeps a finding at full confidence; downgrade-confidence keeps it but subtracts 1 from its contribution to the avg-confidence threshold (floor 0.5); reject drops it. A new duplicate gate (between fetch-failure and invest-failure in the priority table) fires when classification == duplicate and the reviewer rated duplicate_of exact or related — routing the issue to 8b with 'likely-duplicate-of-#N' as reason and 'triage: duplicate' as label. An 'unrelated' rating discards the duplicate claim and the remaining gates apply to the regular investigation output. - schemas/review.json — reviewer verdict schema, per-finding rationale required, closed_world_check object for identifier claims, ratings for related_issues and duplicate_of - prompts/review.txt — adversarial-reviewer prompt per spec §6; input is source excerpts + claim + closed_world_options + cited-issue bodies + duplicate_of body, wrapped as untrusted data; excludes draft comment, free-form reasoning, and voice instructions - Workflow: fetch duplicate_of body (inline step), Stage 6 review call (schema-constrained, no tool access, timeout 600s, --max-budget-usd 1.50, extract-json fallback on prose), reviewer- aware filter step, expanded decision gate, triage: duplicate label path with class inheritance from the target issue (PR #459 item 8), <pipeline_data> wrappers on 8a-render inlined JSON (PR #459 item 3) - Route duplicates through investigate pipeline so Stage 5 + Stage 6 can rate the target (previously deferred straight to 8b) See docs/issue-triage/{README.md §6-§7, implementation-plan.md §Phase 3}. Co-Authored-By: Claude <claude@anthropic.com> * refactor(triage): simplify Phase 3 verdict summary step Two small cleanups in the Stage 6 / "Apply reviewer verdicts" plumbing that don't touch load-bearing behavior (errexit guards, --slurpfile cross-join, schema fallback, gate priority, prompt-injection wrappers all preserved): * Drop the unused dup_num step output — no consumer references steps.dup_fetch.outputs.dup_num; Resolve reason text reads .duplicate_of directly from classification.json. * Collapse the dup_rating jq filter to a single-line .duplicate_of_rating.rating // "none" — jq already treats null.rating as null, so the explicit if/else was just ceremony. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 22:13:45 -04:00
Aaddrick	88df8e8e7e	fix(triage): raise 8b comment word cap 150 → 300 (#464 ) Re-dispatch of #394 showed the full drift-routing path works end-to- end except for the post-processor word-cap: base 8b comment is ~50 words, drift-bridge-candidates block adds ~130 words for 10 bullets, privacy note another ~30 when the reporter is first-time. Actual was 189 words vs 150 cap. Spec §8b note already flagged this: "Verify length is under 150 words (account for optional drift-bridge-candidates block)." The parenthetical acknowledged the block expands the comment, but the original 150 was the base-comment budget and was never adjusted when the drift-bridge extension landed in Phase 2. 300 covers the observed worst case (~190) with headroom for edge cases (long PR titles, longer commit subjects, future drift-bridge output growth) while still bounding the comment at something scannable. Capping the drift-bridge render at N entries is a separate concern — deferred in favor of raising the limit first. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 21:45:32 -04:00
Aaddrick	caec9182c8	fix(triage): investigate timeout bypasses errexit + bump to 10m (#463 ) Re-dispatch of #394 confirmed the 300s timeout bounds the step, but also exposed a second bug: the step failed with exit 124 instead of falling through to 8b gracefully. Downstream steps (Decide / Render / Label / Post) were all skipped, and the raw/payload/stderr archives that the earlier hardening created were never written because the shell aborted at the assignment before `printf > investigate-raw.json` could run. Root cause: GHA's default shell is `bash -e {0}` (errexit). With errexit on, a failing command substitution: raw=$(timeout 300s claude -p ...) propagates the exit code and aborts the script BEFORE `claude_exit=$?` runs. My prior assumption that assignments were exempt from errexit under `bash -e` was wrong in this shell configuration. ## Fix Use the if-form, which is the only reliable way to catch a failing command substitution under `bash -e`: if raw=$(timeout 600s claude -p ... 2>log); then claude_exit=0 else claude_exit=$? fi A timeout (exit 124) or other CLI failure now sets `claude_exit`, writes the archived artifacts, and falls through to 8b with a specific warning — exactly the graceful path the earlier PR intended but errexit short-circuited. ## Also bumped timeout 300s → 600s The original 300s was chosen to be "typical investigate runtime + a bit." Observed times: #424 ran 218s, #442 ran 220s — so 300s left almost no headroom. Doubling to 600s gives room for complex issues to converge while still being short of the ~9-minute hang that motivated the timeout in the first place. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 21:30:15 -04:00
Aaddrick	ce2137f63a	fix(triage): pass investigate schema to claude CLI (#462 ) The investigate call was the only Sonnet invocation in v2 without `--json-schema`. After the parser hardening in #461, re-dispatched runs produced valid JSON — but with fields omitted and creative top-level wrappers. The prompt-described schema isn't enforced without the flag, and the model was using the freedom. ## What changed Add `--json-schema "${schema}"` where `schema=$(cat .claude/scripts/schemas/investigate.json)`, matching the classify and doublecheck pattern. Output parsing prefers the CLI-validated `.structured_output` field (populated when schema fit cleanly), falling back to the existing `.result` + `extract-json.py` + shape-check path for the case where the CLI returns prose on schema miss. The hardened extraction from #461 stays in place as the safety net. ## Why post-hoc still helps Per Claude Code CLI docs (and confirmed via the claude-code-guide research), `--json-schema` applies validation after the agent loop ends — not at generation time. That's weaker than the Agent SDK's constrained decoding, but still catches the specific failures seen in the re-dispatch of #424 and #442: - Top-level `pattern_sweep` and `proposed_anchors` omitted - Per-finding `confidence` / `line_end` returned as null (violates required enum / integer) - Extra top-level fields like `summary`, `classification`, `investigation_id` If post-hoc validation isn't enough, the next escalation is the Agent SDK (constrained decoding via grammar compilation). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 19:23:28 -04:00
Aaddrick	82908fbe64	fix(triage): harden Investigate step against hangs and parser drift (#461 ) Three failure modes surfaced in the first round of dispatches against real issues, all in the Stage 4 Investigate step: - #394 hung for 9 min (the Claude CLI wedged; no per-call timeout); user had to cancel manually. Step log was silent because `2>/dev/null` swallowed stderr. - #424 and #442 both ran to CLI completion but the payload's jq presence-check rejected the output. Raw response wasn't archived, so the specific rejection cause was unknowable post-hoc. ## Fix - `timeout 300s claude -p ...` — bounds the step at 5 min; exit 124 routes to 8b no-findings gracefully via the existing warning branch. - `2>/tmp/triage/investigate-stderr.log` instead of `2>/dev/null` — CLI diagnostics ride along in the run's uploaded artifact bundle, available for post-mortem without a re-dispatch. - Raw CLI response archived as `investigate-raw.json` before any parsing. Extracted payload archived as `investigate-payload.txt` before schema checks. Schema-reject no longer loses the evidence. - Fence-strip + jq-presence-check replaced with `.claude/scripts/triage/extract-json.py`, which uses `json.JSONDecoder.raw_decode` to handle leading OR trailing prose around the JSON body. Addresses PR #459 review item 6. - The shape check now verifies each of the four required fields is an `array`, not just present — `{"findings": "oops"}` would pass presence and explode downstream. Addresses PR #459 review item 7. ## Testing `extract-json.py` exercised locally against: bare JSON, leading prose, trailing prose, fence-wrapped JSON, pure prose (exit 1), malformed JSON (exit 2). All cases produce the expected output or exit code. `actionlint -shellcheck` clean on the workflow. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 19:08:58 -04:00
Aaddrick	1de897f56e	feat(triage): dry_run input + pre-dispatch fixes (#460 ) Adds a dry_run dispatch input so the pipeline can be validated against real issues without writing to the repo. Also folds in three items from the #459 code review that are easier to ship before the first round of dispatches than after. ## dry_run - New boolean input on `workflow_dispatch` (default false) - Guards `Apply labels` and `Post comment` steps - Step summary shows a ⚠ banner + a "Dry run" row when enabled - Artifacts still upload, so the rendered `comment.md` is inspectable ## Review fixups (from PR #459 review) 1. Decision gate priority. Spec §7 puts version drift ahead of fetch failure; implementation had them reversed. When both fire, `version-drift` is the more specific signal and is the only path that hands the maintainer drift-bridge candidates. Swapped. 2. Issue titles wrapped as untrusted. `<issue_title>` now carries `source="reporter, untrusted"` in all three prompt assemblies (classify / doublecheck / investigate). Instruction-as-data directive in each prompt updated to name both `<issue_title>` and `<issue_body>`. Reporter-controlled title injection surface closed. 5. `drift-bridge.sh` version search is literal. `--fixed-strings` added to `git log --grep` so `1.3.23` doesn't match `1x3y23`. Items 3, 4, 6-9 from the review are deferred to Phase 3 (adversarial reviewer) per the review's own scoping. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 18:48:01 -04:00
Aaddrick	34631068ee	feat(triage): Phase 2 — investigate, mechanical validate, 8a findings (#458 ) Extends the Phase 1 deferral-only pipeline with the bug-investigation path: Stages 3 (fetch reference), 4 (investigate), 5 (mechanical validate), 7 partial (decision gate), and 8a (findings variant). Non-bug classifications still route through 8b; adversarial reviewer is Phase 3. ## What Phase 2 adds - Stage 3 — Fetch reference. `gh release download --pattern 'reference-source.tar.gz'` with 3× exponential backoff (2s/8s/32s). Fetch failure routes to 8b with reason `reference-source unavailable` (the 7th reason added to `reasons.json`). - Stage 4 — Investigate. `schemas/investigate.json` + `prompts/investigate.txt`. Claude reads repo + reference source via tool access (`--dangerously-skip-permissions`), emits structured findings / pattern_sweep / proposed_anchors / related_issues. Prompt enforces hypothesis voice, cross-cutting-sweep obligation, hard schema bans. - Stage 5 — Mechanical validation. `.claude/scripts/triage/ validate.sh` — pure bash. Checks per finding: file exists, line range valid, evidence_quote grep-matches at cited line, closed-world options extracted for identifier claims (grep heuristic for Phase 2; ast-grep upgrade deferred to Phase 3). Per anchor: `grep -P` match count exactly equal to expected_match_count. Per related_issue: `gh issue view` fetch + body excerpt. Emits `validation.json`. - Stage 3a — Version drift check. Compares classify's `claimed_version` against `vars.CLAUDE_DESKTOP_VERSION`. Drift flag routes to 8b with `version drift` reason; investigation still runs. - Drift-bridge sweep. `.claude/scripts/triage/drift-bridge.sh` — bash, resolves claimed_version to approximate date via `git log --grep`, then date-windowed `git log` on finding files + `gh pr list` basename search. Candidates attach to 8b as a rendered bullet block. - Stage 7 partial — Decision gate. Priority: drift → 8b drift- bridge · fetch failure → 8b reference-source-unavailable · investigate failure or zero surviving findings → 8b no-findings · avg confidence < medium → 8b low-confidence · else → 8a. - Stage 8a — Findings variant. `schemas/comment-findings.json` + `prompts/comment-findings.txt`. Claude emits structured comment object (hypothesis_line, findings[], patch_sketch?, related_issues); bash renders markdown. No post-hoc prose stripping — the schema guarantees shape. 400-word cap truncates the `<details>` patch block only. - Stage 8b extension. Drift-bridge-candidates bullet block renders only when reason is `version drift` AND the sweep returned ≥1 candidate. Phase 1's first-issue privacy note + reason-enum post- processor are preserved. - Stage 9. Labels: 8a → `triage: investigated`; 8b routing unchanged. Artifacts extended with `investigation.json`, `validation.json`, `drift-bridge-candidates.json` (conditional). ## Risks validated locally - Mechanical validation catches fabricated identifiers and non- matching anchors — smoke tested with a two-finding / two-anchor fixture (one real, one fabricated per kind); failure_reasons fire correctly on the fabricated ones. - Closed-world extraction via grep heuristic: on a JS switch with three cases, returns all three as `closed_world_options` bounded to ±100 lines. - `grep -c` exits 1 on no-match and prints "0" — validated the `\|\| true` idiom doesn't double-count. ## Deferred - Stage 6 adversarial reviewer (Phase 3) - Confirmed-duplicate routing with Stage 6's exact/related rating (Phase 3) - Feature-design variant 8c (Phase 4) - Suspicious-input tells + edit-during-triage detection (Phase 4) - ast-grep upgrade for closed-world extraction (Phase 3) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 18:09:15 -04:00
Aaddrick	0f55547523	feat(triage): Phase 1 — gate, classify, 8b deferral, label/post/archive (#457 ) Turns the Phase 0 skeleton into a live triage pipeline. Every dispatched issue now gets a structured human-deferral comment and a triage label. No investigation yet — that's Phase 2. ## Stages landed (per docs/issue-triage/implementation-plan.md §Phase 1) - Stage 1 — Gate. `github-actions[bot]` author skip; manual dispatch intentionally bypasses the already-triaged / needs-human checks (those only matter on the `opened` trigger, deferred to cutover). - Stage 1 — Input snapshot. `issue.body`, `issue.updated_at`, `sha256(issue.body)` captured before any LLM call; archived as `input_snapshot.json`. Edit-during-triage comparison lands in Phase 4. - Stage 2 — Classify. `schemas/classify.json` + `prompts/classify.txt`. Fields: classification enum, confidence, claimed_version, suggested_labels[], duplicate_of, regression_of. Issue body wrapped as untrusted data. - Stage 2 — Doublecheck. `schemas/classify-doublecheck-bugfeature.json` + `prompts/classify-doublecheck-bugfeature.txt`. Runs conditionally when the first pass returns `bug` or `feature`. Fresh context — no first-pass output exposed. - Stage 7 (partial) — Reason selection. Two reasons fire in Phase 1: `ambiguous` when the doublecheck disagrees, `no-findings` otherwise. The other four reasons in `reasons.json` light up in Phases 2–4. - Stage 8b — Human-deferral render. Bash-only template reading `reasons.json`. First-issue privacy note appended when the reporter has no prior issues on the repo. Post-processor enforces: reason line in `reasons.json` enum, comment under 150 words. - Stage 9 — Label + post + archive. Cached `gh label list` at workflow start; cardinality-1 slots (triage state, class, priority) applied directly; categories filtered through the cache + blocklist. Never emits `priority: critical`. Artifacts uploaded with 14-day retention: `input_snapshot.json`, `classification.json`, `classification-doublecheck.json` (when ran), `comment.md`, `issue.json`, `repo-labels.json`. ## Validation - actionlint + shellcheck clean on inline bash - Schemas parse as JSON; prompts validated via jq - Matches Phase 1 exit criteria once dispatched against real issues (bug with stack trace → needs-human + no-findings; ambiguous → needs-human + ambiguous; no hallucinated labels applied) ## Deferred to Phase 2+ - Investigation (Stage 4), mechanical validation (Stage 5), adversarial review (Stage 6) - Findings variant (8a), feature-design variant (8c) - Drift-bridge sweep (extends 8b with candidate commits/PRs) - Confirmed-duplicate routing (needs Stage 5+6) - Suspicious-input tells and edit-during-triage detection (Phase 4) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 17:39:37 -04:00
Aaddrick	b354353a36	feat(triage): Phase 0 scaffold for issue triage v2 (#456 ) Directory scaffolding + skeleton workflow + issue templates. No live behavior — v2 remains workflow_dispatch-only with `permissions: {}` and a single job that echoes the issue number. v1 (`issue-triage.yml`) is untouched. Per docs/issue-triage/implementation-plan.md Phase 0: - `.github/workflows/issue-triage-v2.yml` — skeleton workflow - `.github/ISSUE_TEMPLATE/{config,bug_report,feature_request}.yml` — shapes input for the Stage 2 classifier and Stage 4 investigator; privacy disclosure in a non-editable markdown info block - `.claude/scripts/prompts/.gitkeep` — prompts land per-phase - `.claude/scripts/taxonomies/label-blocklist.json` — Stage 9 suggested- label gating (wontfix, invalid, duplicate, help wanted, good first issue); additional taxonomies land in Phase 4 - `.claude/scripts/reasons.json` — Stage 8b deferral-reason SSOT consumed by the renderer and post-processor (six entries) - README Privacy section — keeps disclosure text discoverable without filing an issue; matches the templates' info block Exit criteria: dispatch against any issue number prints correctly; no API calls, no comments, no labels; `bug_report.yml` / `feature_request .yml` render cleanly with the privacy block. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 17:29:17 -04:00
aaddrick	01f7125d6a	ci: refresh issue-triage prompts for scripts/patches/ layout Updates the inline prompt text that guides the triage investigation agent so it looks for patches in the correct location. The previous prompt told the agent "search build.sh for patch_ functions" — those functions have moved into scripts/patches/*.sh organized by subsystem (tray, cowork, claude-code, quick-window, titlebar, app-asar). Without this, the triage agent would open build.sh, find only the orchestrator's source statements, and fail to locate the actual patch logic — producing lower-quality diagnoses. Three prompt blocks updated: the "How This Project Patches" section, the "All bugs are ours to fix" checklist, and the "Patch Approach" output format. build.sh itself still appears as the orchestrator reference for context. Co-Authored-By: Claude <claude@anthropic.com>	2026-04-20 07:27:17 -04:00
aaddrick	564f465840	ci: update check-claude-version paths to scripts/setup/detect-host.sh The auto-version-bump workflow greps/seds against the Claude Desktop download URLs and SHA-256 checksums. With the build.sh split those declarations now live in scripts/setup/detect-host.sh inside detect_architecture's case statement. Without this fix, the next upstream release triggers the workflow and it silently fails to update either the URLs or the checksums (greps return empty, seds match nothing, git diff finds no changes, no commit, no tag). Updates all 17 references — grep targets, sed targets, git diff/add paths, and step labels / echo messages for consistency. The patterns themselves (x86_64) / aarch64) case matching, claude_download_url=' extraction, in-range claude_exe_sha256 replacement) are unchanged and still match the new file's content. Co-Authored-By: Claude <claude@anthropic.com>	2026-04-20 07:26:40 -04:00
aaddrick	526acbad1e	ci: enable shellcheck -x to follow sourced modules Passes -x (--external-sources) to shellcheck so it follows the '# shellcheck source=...' directives in build.sh and checks the split modules in their sourced context. Without this, every sourced module triggers SC1091 (can't follow source) plus SC2154/SC2034 noise from cross-file variable usage. Also quotes $script_dir inside $(dirname $script_dir) in scripts/packaging/rpm.sh — the heredoc-embedded command substitution tripped SC2086 once shellcheck started analyzing the subshell context. Co-Authored-By: Claude <claude@anthropic.com>	2026-04-20 07:25:21 -04:00
aaddrick	d574ac54d7	chore: add .github/CODEOWNERS for per-subsystem review ownership Groups the repo into logical roles (build orchestration, setup, electron patches, desktop integration, staging, packaging, distribution, CI, docs) with @aaddrick as default. Cowork paths route to @RayCharlizard; nix paths route to @typedrat. Overrides are listed after broad globs so last-match-wins resolves in the intended direction (e.g. docs/cowork-*.md is claimed by @RayCharlizard after the broad /docs/ assignment). Pairs with the scripts/ subdirectory layout landed in the previous commits — each logical role maps cleanly to a path prefix. Co-Authored-By: Claude <claude@anthropic.com>	2026-04-20 07:12:22 -04:00
Aaddrick	cfdfd2d483	Merge pull request #338 from sabiut/feature/integration-tests feat: add integration tests for build artifacts	2026-04-12 15:21:51 -04:00
aaddrick	0782c5a70e	ci: disable compare-release to use generic release notes Bypasses the AI-powered compare-releases step to reduce API costs. Falls back to the existing generic release notes template. Co-Authored-By: Claude <claude@anthropic.com>	2026-04-02 22:15:38 -04:00
aaddrick	140a4188d2	fix(ci): increase compare-releases timeout to 3 hours The OOM fix is working — the script survives the full pipeline now. But 498 hunks of Claude-powered analysis need more than 5 minutes. Increase timeout to 180 minutes so AI-generated release notes can complete. The fallback and if: always() hardening remain as safety net. Co-Authored-By: Claude <claude@anthropic.com>	2026-03-31 11:43:54 -04:00
aaddrick	15c703427b	fix(ci): re-enable compare-releases step OOM fix is in progress in claude-desktop-versions. Re-enabling so the next release tests the fix. The if: always() hardening on fallback and release steps ensures the release still ships if the script fails. Co-Authored-By: Claude <claude@anthropic.com>	2026-03-31 10:14:58 -04:00
aaddrick	beaf9ae2e2	fix(ci): disable compare-releases to unblock releases (#361 ) The concurrency group fix was insufficient — the runner SIGTERM occurs even with a single CI run. The compare-releases.py script itself causes the runner to die (~86s, exit 143) regardless of concurrency. Disabling the step entirely until the script is debugged in claude-desktop-versions. The fallback notes and if: always() hardening remain in place. Co-Authored-By: Claude <claude@anthropic.com>	2026-03-31 10:04:01 -04:00
aaddrick	bdcedbfea6	fix(ci): prevent runner kill from blocking release creation (#361 ) Add concurrency group to CI workflow so concurrent runs (triggered when check-claude-version pushes to main then pushes a tag) queue instead of killing each other. This addresses the ~86-second runner SIGTERM that has blocked 10 releases in March. Also harden release steps as defense-in-depth: - timeout-minutes: 5 on compare-releases step - if: always() on fallback notes and Create GitHub Release steps Co-Authored-By: Claude <claude@anthropic.com>	2026-03-31 09:48:05 -04:00
Sum Abiut	820b022fe0	fix: address PR #338 review feedback - Remove workflow_dispatch trigger (no artifacts on manual dispatch) - Add nodejs npm to Ubuntu test dependencies - Add explicit permissions: contents: read to workflow - Replace echo\|grep with [[ ]] pattern matching (4 instances) - Drop ambiguous 2>&1 from install commands - Use (( ++ )) arithmetic style in test helpers	2026-03-30 01:41:36 +11:00
Sum Abiut	0e4a1e7cac	feat: add integration tests for build artifacts Validate deb, rpm, and appimage packages after build in CI. Tests verify package metadata, file layout, desktop entries, icons, launcher scripts, asar contents (frame-fix, cowork, native stub, tray icons), and --doctor smoke tests. Runs as a reusable workflow with matrix strategy (one job per format) between build and release jobs, gating releases on passing artifact validation.	2026-03-30 01:32:41 +11:00
Aaddrick	a3190c38b9	fix: disable VM file downloads on Linux to prevent checksum loop (#337 ) * fix: disable VM file downloads on Linux to prevent checksum loop (#334) Patch 4 in patch_cowork_linux() previously copied win32 VM file entries (rootfs.vhdx, vmlinuz, initrd) with Linux-specific checksums. These checksums drifted from CDN content, causing an infinite download retry loop for all Linux users — including bwrap users who don't need VM files at all. The root cause: Patch 1 opens the yukonSilver feature gate for Linux, making the VM download path reachable even on bwrap-only installs. The triage bot missed this because it analyzed unpatched code. Fix: inject empty file arrays (linux:{x64:[],arm64:[]}) instead of copying win32 entries. This is safe because: - The VM backend is non-functional on Linux (bwrap is the only backend) - Empty arrays make the download loop a no-op (for...of [] skips) - [].every() returns true (vacuous truth), reporting "Ready" status - The linux key must exist to prevent TypeError on files["linux"]["x64"] Removes ~230 lines of checksum infrastructure from build.sh and CI that maintained checksums for a non-functional feature. Fixes #334 Closes #329 Closes #332 Co-Authored-By: Claude <claude@anthropic.com> * style: clean up stray blank line and use durable issue reference Co-Authored-By: Claude <claude@anthropic.com> --------- Co-authored-by: Claude <claude@anthropic.com>	2026-03-22 22:28:22 -04:00
aaddrick	a29fc0eaa5	fix: remove upstream label and reframe triage ownership All bugs are ours to investigate and fix. This project's goal is to take a working Anthropic product and make it work on Linux. Behavioral differences between Windows/macOS and our build are gaps in our patching, not someone else's problem. - Delete 'upstream' label from repo (removed from 7 issues) - Replace "check patches before blaming upstream" with "all bugs are ours to fix" - Remove upstream from label glossary and suggested labels - Update all references in agent, workflow, and classification schema Co-Authored-By: Claude <claude@anthropic.com>	2026-03-22 09:05:38 -04:00
aaddrick	ffd4ef3d75	fix: improve triage investigation accuracy and context Lessons from #329 where triage fabricated claims about manifest entries and missed that our own patch was the cause: - Add "check our patches first" rule: for bugs in patched areas, check build.sh patches before blaming upstream - Add "verify before stating" rule: only state facts found in code, never speculate about code existence - Add "validate network assumptions" rule: use curl to check URLs before speculating about CDN failures - Include CLAUDE.md in investigation prompt for full project context - Increase investigation budget from $1 to $3 for deeper analysis Co-Authored-By: Claude <claude@anthropic.com>	2026-03-22 08:57:39 -04:00

1 2 3 4

157 Commits