fix(triage): pass investigate schema to claude CLI (#462)

The investigate call was the only Sonnet invocation in v2 without
`--json-schema`. After the parser hardening in #461, re-dispatched
runs produced valid JSON — but with fields omitted and creative
top-level wrappers. The prompt-described schema isn't enforced
without the flag, and the model was using the freedom.

## What changed

Add `--json-schema "${schema}"` where `schema=$(cat
.claude/scripts/schemas/investigate.json)`, matching the classify
and doublecheck pattern.

Output parsing prefers the CLI-validated `.structured_output` field
(populated when schema fit cleanly), falling back to the existing
`.result` + `extract-json.py` + shape-check path for the case where
the CLI returns prose on schema miss. The hardened extraction from
#461 stays in place as the safety net.

## Why post-hoc still helps

Per Claude Code CLI docs (and confirmed via the claude-code-guide
research), `--json-schema` applies validation after the agent loop
ends — not at generation time. That's weaker than the Agent SDK's
constrained decoding, but still catches the specific failures seen
in the re-dispatch of #424 and #442:

- Top-level `pattern_sweep` and `proposed_anchors` omitted
- Per-finding `confidence` / `line_end` returned as null (violates
  required enum / integer)
- Extra top-level fields like `summary`, `classification`,
  `investigation_id`

If post-hoc validation isn't enough, the next escalation is the
Agent SDK (constrained decoding via grammar compilation).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Aaddrick
2026-04-20 19:23:28 -04:00
committed by GitHub
parent 82908fbe64
commit ce2137f63a

View File

@@ -306,6 +306,7 @@ jobs:
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
schema=$(cat .claude/scripts/schemas/investigate.json)
title=$(jq -r '.title' /tmp/triage/issue.json)
body=$(jq -r '.body // ""' /tmp/triage/issue.json)
classification=$(cat /tmp/triage/classification.json)
@@ -348,29 +349,25 @@ jobs:
} > /tmp/triage/investigate-prompt.txt
# The investigation call runs with tool access (read/grep) so
# Claude can verify claims against actual source. Output is
# the model's final message; we parse JSON out and validate
# shape.
# Claude can verify claims against actual source. `--json-schema`
# constrains the FINAL message; tool-call intermediates flow
# freely. Per CLI docs, validation is post-hoc — conforming
# output lands in `.structured_output`, prose in `.result`.
# We prefer `.structured_output` and fall back to the
# extract-json.py helper on the `.result` field in case the
# CLI hands back prose for any reason.
#
# Hardening against the two failure modes seen in the first
# dispatch round (PR #460 follow-up):
#
# * `timeout 300s` bounds the step at 5 min so a wedged CLI
# call can't run to the job-level timeout.
# * Stderr goes to an archived log file so post-mortem
# debugging has something to read — previously `2>/dev/null`
# swallowed everything, leaving a silent 8-min gap in the
# step log when the call hung.
# * Raw CLI response + extracted payload are archived before
# schema checks, so a schema-reject still leaves artifacts
# to inspect.
# * JSON extraction uses Python's `raw_decode` — robust to
# leading OR trailing prose around the JSON body, unlike
# the fence-strip + jq-presence-check it replaces.
# Step hardening from earlier PR:
# * `timeout 300s` bounds the step at 5 min
# * Stderr to an archived log file so a silent hang is
# post-mortem debuggable
# * Raw response + extracted payload archived before schema
# checks so a reject still leaves artifacts to inspect
set +o pipefail
raw=$(timeout 300s claude -p "$(cat /tmp/triage/investigate-prompt.txt)" \
--dangerously-skip-permissions \
--output-format json \
--json-schema "${schema}" \
--model claude-sonnet-4-6 \
--max-budget-usd 3.00 \
2>/tmp/triage/investigate-stderr.log)
@@ -389,28 +386,35 @@ jobs:
exit 0
fi
payload=$(printf '%s' "${raw}" | jq -r '.result // empty')
printf '%s' "${payload}" > /tmp/triage/investigate-payload.txt
# Prefer the CLI-validated structured_output when the schema
# fit cleanly.
extracted=$(printf '%s' "${raw}" | jq -c '.structured_output // empty')
if [[ -z "${payload}" ]]; then
echo "investigate_ok=false" >> "$GITHUB_OUTPUT"
echo "::warning::empty investigation result"
exit 0
if [[ -z "${extracted}" ]]; then
# Fallback: pull .result and try to rescue JSON from prose.
# Handles the case where the CLI's schema enforcement
# didn't land a structured_output (e.g. tool-use loop
# drifted and the CLI returned prose in .result).
payload=$(printf '%s' "${raw}" | jq -r '.result // empty')
printf '%s' "${payload}" > /tmp/triage/investigate-payload.txt
if [[ -z "${payload}" ]]; then
echo "investigate_ok=false" >> "$GITHUB_OUTPUT"
echo "::warning::empty investigation result"
exit 0
fi
extracted=$(python3 .claude/scripts/triage/extract-json.py \
< /tmp/triage/investigate-payload.txt) || {
echo "investigate_ok=false" >> "$GITHUB_OUTPUT"
echo "::warning::no valid JSON object in investigation payload"
exit 0
}
fi
# Extract the first complete JSON object from the payload.
# Handles leading prose ("Here is...") and trailing prose
# ("Let me know...") that the fence-strip didn't catch.
extracted=$(python3 .claude/scripts/triage/extract-json.py \
< /tmp/triage/investigate-payload.txt) || {
echo "investigate_ok=false" >> "$GITHUB_OUTPUT"
echo "::warning::no valid JSON object found in investigation payload"
exit 0
}
# Type-check the four required top-level arrays — presence
# alone would pass `{"findings": "oops"}` which then explodes
# in validate.sh (PR #459 review item 7).
# in validate.sh. When `--json-schema` bit, this is a no-op.
if ! printf '%s' "${extracted}" | jq -e '
(.findings | type == "array")
and (.pattern_sweep | type == "array")