docs(learnings): document MCP double-spawn upstream bug (#526) (#527)

* docs(learnings): document MCP double-spawn upstream bug (#526)

Captures the reporter's root-cause analysis for issue #526: stdio MCP
servers in claude_desktop_config.json get spawned twice when both the
chat panel and the Code/Agent (Cowork) panel are active. The
duplication happens entirely in upstream Anthropic Claude Desktop main
(LocalSessions and LocalAgentModeSessions each hold an independent
Claude Agent SDK query whose stdio transport bypasses the global hZ
MCP registry).

Includes verification that this packaging is not implicated, the
lockfile + idempotent-write workaround pattern for affected MCP
authors, and routing guidance for upstream reports.

Co-Authored-By: Claude <claude@anthropic.com>

* docs(learnings): simplifier pass on MCP double-spawn entry

Drop redundant "Anthropic" qualifier in Status section and reword
CLAUDE.md index bullet to noun-phrase form matching siblings.

Co-Authored-By: Claude <claude@anthropic.com>

* docs(learnings): apply review fixes from #527

- Fix `LocalAgentModeSessions` IPC namespace: add missing `_$_`
  separator (was `claude.web_$_LocalAgentModeSessions_*`, should be
  `claude.web_$_LocalAgentModeSessions_$_*`). Verified against the
  channel names in the actual minified source.
- Add back the `Logs prefix` column (`[CCD]` / `[LAM]`) the original
  issue body had — these are the literal grep targets in
  `~/.config/Claude/logs/` for confirming the bug hit.
- Re-route the secondary upstream venue from `anthropics/claude-code`
  to `anthropics/claude-agent-sdk-typescript`. The SDK transport
  (`spawnLocalProcess` / `Du.spawn`) lives in the SDK's own public
  repo (issues enabled); pointing at `claude-code` while saying the
  CLI isn't on the spawn path is the exact contradiction the warning
  paragraph below it tries to prevent.
- Workaround note: reclaim a stale lock via `rename()` over the path,
  not `unlink()` then re-open. Heads off the obvious-but-racy port
  for anyone copying the pattern.

Co-Authored-By: Claude <claude@anthropic.com>

---------

Co-authored-by: Claude <claude@anthropic.com>
This commit is contained in:
Travis
2026-04-30 22:51:08 -05:00
committed by GitHub
parent b5339d0f0b
commit 646a658fc5
2 changed files with 134 additions and 0 deletions

View File

@@ -13,6 +13,7 @@ The [`docs/learnings/`](docs/learnings/) directory contains hard-won technical k
- [`plugin-install.md`](docs/learnings/plugin-install.md) — Anthropic & Partners plugin install flow, gate logic, backend endpoints, and DevTools recipes
- [`apt-worker-architecture.md`](docs/learnings/apt-worker-architecture.md) — APT/DNF binary distribution via Cloudflare Worker + GitHub Releases, redirect chain, credential ownership, heartbeat runbook
- [`tray-rebuild-race.md`](docs/learnings/tray-rebuild-race.md) — why destroy + recreate on `nativeTheme` updates briefly duplicates the tray icon on KDE Plasma, and the in-place `setImage` + `setContextMenu` fast-path that avoids the SNI re-registration race
- [`mcp-double-spawn.md`](docs/learnings/mcp-double-spawn.md) — Stdio MCPs spawn 2× when chat and Code/Agent panels are both active, root cause in upstream session managers, MCP-author workaround
## Code Style

View File

@@ -0,0 +1,133 @@
# MCP Double-Spawn (Chat + Code/Agent Panel)
## Why This Exists
When a Claude Desktop session has both the classic chat panel
and the Code/Agent (Cowork) panel active, **every stdio MCP
server declared in `~/.config/Claude/claude_desktop_config.json`
gets spawned twice** by the Electron main process. Reported and
root-caused in detail in
[#526](https://github.com/aaddrick/claude-desktop-debian/issues/526).
## Symptoms
`ps -ef` after a session opens both panels shows two batches of
MCP children of the same Electron main PID, separated by however
long it took the user to open the second panel:
```
PID PPID(electron) CMD
372628 372434 python ← batch 1 (chat panel)
372633 372434 node
372648 372434 python
...
373288 372434 python ← batch 2 (Code/Agent panel)
373296 372434 node
373327 372434 python
```
Killing one PID disconnects one panel; the other survives. Two
independent client↔server pairs, no failover.
Most stdio MCPs don't notice they were doubled — each instance
talks to its own client and exits cleanly. The bug only surfaces
when an MCP touches **shared external state**: a single
WebSocket, files on disk that the other instance also writes,
external services with single-connection contracts, etc.
## Root Cause (Upstream)
Two parallel session managers live inside Electron main, each
holding an independent Claude Agent SDK `query`:
| Manager class | IPC namespace | Coordinator | Logs prefix |
|--------------------------|------------------------------------------|-----------------|-------------|
| `LocalSessions` | `claude.web_$_LocalSessions_$_*` | `n2t("ccd")` | `[CCD]` |
| `LocalAgentModeSessions` | `claude.web_$_LocalAgentModeSessions_$_*`| `n2t("cowork")` | `[LAM]` |
The logs prefixes are what to grep `~/.config/Claude/logs/` for to
confirm a session is hitting both coordinators (and therefore this
bug specifically).
Each `query` holds its own SDK transport. The transport's
`spawnLocalProcess` (`Du.spawn`) launches stdio MCPs **without
consulting the global registry** that *would* dedupe them
(`hZ` map, accessed via `oUt(serverName)` /
`launchMcpServer`). That registry is only used for the
"internal" cowork in-process MessageChannelMain path.
Net result: 2 coordinators × N configured MCPs = 2N processes.
Symbol names (`n2t`, `hZ`, `oUt`, `LocalSessions`,
`LocalAgentModeSessions`) are minified and **will rename across
upstream releases**.
## Status
**Upstream Claude Desktop bug. Not patchable in this repo.** A
fix would require either:
- Routing the SDK stdio transport through `oUt`/`hZ` (the
existing serialized-per-name registry), or
- Sharing one MCP-server registry between the `ccd` and
`cowork` coordinators.
Both live inside the closed-source SDK transport / session
manager wiring. Regex-matching the minified symbols from
`scripts/patches/` would be fragile against release-to-release
renames and exceeds this repo's "minimal Linux-compat patches
only" charter.
## What's Already Verified Clean
- All 7 patches in `scripts/patches/*.sh` — zero references to
MCP, mcpServer, LocalSessions, LocalAgentModeSessions,
transportToClient, MessageChannelMain, n2t, hZ, oUt.
- `scripts/launcher-common.sh` — no MCP or config-load logic.
- `scripts/packaging/{appimage,deb,rpm}.sh` — no MCP or
config-load logic.
- `scripts/doctor.sh:420` — only reads
`claude_desktop_config.json` to JSON-lint it for diagnostics;
not in the runtime spawn path.
The bug reproduces identically against the unmodified upstream
asar; no Linux-only init in this packaging contributes to the
double-load.
## Workaround (For MCP Authors)
Until upstream fixes it, MCPs that touch shared external state
can defend themselves:
1. **Lockfile + staleness check.** `fs.openSync('wx')` with PID,
verified live via `process.kill(pid, 0)`. The second instance
detects a live owner and backs off, or reclaims a stale lock.
Reclaim atomically — write the new lock to a temp path and
`rename()` over the stale one, never `unlink()` then re-open
(a third instance can win the gap).
2. **Idempotent state writes.** Resolve target files/keys from
the incoming message payload rather than from in-process
state, so two instances writing the same broadcast end up at
the same target instead of cross-contaminating per-process
keys.
The reporter's `baro-voyager` MCP shipped both in commit
`cb7bfbb` as a worked reference.
## Routing Upstream Reports
- **Primary:** in-app feedback (Help → Send Feedback) or
`support@anthropic.com`. The duplication happens in
closed-source Desktop main.
- **Secondary:** an SDK-transport-flavored issue on
[`anthropics/claude-agent-sdk-typescript`](https://github.com/anthropics/claude-agent-sdk-typescript)
is defensible — the spawn path goes through the **Claude Agent
SDK's** `query` transport (`spawnLocalProcess` / `Du.spawn`),
which is shared surface area. Reference the missing `hZ`
consultation explicitly.
The embedded Claude Code CLI subprocess inside Claude Desktop is
**not** the cause — it receives `--mcp-config` only when the
config map is non-empty, and is empty in this flow. Don't route
to `anthropics/claude-code` claiming the CLI itself is
double-spawning MCPs.