mirror of
https://github.com/aaddrick/claude-desktop-debian.git
synced 2026-05-17 00:26:21 +03:00
* feat(triage): Phase 4 sub-PR 2 — suspicious-input tells
Adds a conservative Stage 2a tripwire that scans the raw issue body
and title for prompt-injection tells before any LLM call. A match
short-circuits routing to 8b with reason
`suspicious-input — manual review`, no Sonnet invocation.
The scan is the front-line filter; the actual injection mitigations
(wrap-as-data, fresh-context reviewer, schema-constrained output)
remain in place for everything that doesn't trip. The two layers are
complementary: the scan catches the obvious attempts cheaply, the
downstream defenses protect against the clever ones.
Taxonomy
- taxonomies/suspicious-input-tells.json — eight tells with regex
patterns and rationale:
- ignore-prior-instructions: classic opener
- system-prompt-leak: exfiltration attempts
- role-override: "you are now a different…"
- forget-instructions: variation of ignore-prior
- developer-mode: named jailbreaks (DAN, etc.)
- instruction-injection-sysrole: chat-template tokens
- long-base64-block: 200+ contiguous base64 chars
- unicode-tag-sequence: U+E0000-E007F invisibles
Scanner
- scripts/triage/suspicious-input-scan.sh — pure bash, PCRE via
grep -Pzi, writes suspicious-input.json with matched_tells[].
Uses the same taxonomy-as-data pattern as reasons.json and
label-blocklist.json.
Workflow
- Stage 2a step runs between input snapshot and classify, outputs
`suspicious` boolean
- Classify + doublecheck both `if:`-gated so they skip on a hit
- Decide route takes suspicious first, before the doublecheck
disagreement check — a tripped tell defers deterministically
- Step summary shows the suspicious flag
Co-Authored-By: Claude <claude@anthropic.com>
* refactor(triage): drop dead null-string guards in suspicious-input scan
jq -r '.body // ""' already returns an empty string for JSON null or a
missing field, so the subsequent `[[ "${body}" == "null" ]]` guards only
fire when a reporter's body is the literal four-character string "null"
— which isn't an injection signal and matches no tell. The comment
describing the guards was also wrong about jq's behavior. Remove both
guards and correct the comment.
Also fix a misleading comment about `|| true` (which isn't in the code)
and collapse the 4-line `suspicious` boolean derivation into a single
`jq 'length > 0'`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>