Fixing "The model's tool call could not be parsed" in Claude Code

2026/06/03 14:23 Claude Code Anthropic debugging mitmproxy

If you keep seeing this error in Claude Code — especially after the model has been “thinking” for a long time:

● The model's tool call could not be parsed (retry also failed).
✳ Churned for 5m 47s

This post gives you a fix you can apply right away, followed by the analysis of what actually causes it.

A few key conclusions up front: this is fundamentally a model-side issue, not your local config or your network — the root cause is the model exhausting its output-token budget while thinking deeply in a large context, so the tool-use block never gets emitted (full analysis below). It only reproduces for users in certain regions: with the same version and config, the trigger rate varies widely across regions/routes, and the JP (Japan) region seems to be hit especially hard. If someone near you never sees it, that doesn’t mean your setup is broken. So the fixes here are mitigations, not a true fix — they sharply cut the reproduction rate but can’t guarantee it’s gone, because the root cause is model-side. To avoid it entirely, temporarily switch to an older model (e.g. drop from Opus 4.8 back to a previous-generation Opus/Sonnet), which isn’t affected by this and is the most reliable workaround for now — switch back once there’s an official fix.

中文版：解决 Claude Code 报错 The model’s tool call could not be parsed 日本語版：Claude Code の「The model’s tool call could not be parsed」エラーを直す

The fix, first

This error is usually not a network problem. It’s a model-side behavior — extended thinking exhausting the output-token budget. The options below are all mitigations that sharply cut the reproduction rate; if you need to avoid it entirely, jump to Option 3 (switch to an older model).

Option 1: Turn off always thinking (recommended)

Edit ~/.claude/settings.json:

{
  "alwaysThinkingEnabled": false
}

Or use /config inside a Claude Code session and turn Always thinking off.

Start a new session for this to take effect — settings.json is read once at session startup, so editing it mid-session has no effect on a session that’s already running.

Turning it off does not make the model worse: it still thinks when it decides thinking is needed; it just stops spending the token budget on thinking on every turn. After disabling it, I ran the same long tasks with effortLevel: high and even xhigh, and the error never came back.

Option 2: Lower the effort, or start a fresh context

Drop effortLevel from high to a lower tier (e.g. low) to cap the thinking budget
Or, when the current conversation context is already large, start a clean session to reduce input tokens and leave more room for output

Option 3: Temporarily switch to an older model (the most reliable workaround)

The first two options only lower the reproduction rate; because the root cause is model-side, they can’t guarantee it’s gone. If you’re on a deadline and don’t want to keep getting interrupted, the most reliable move is to temporarily switch back to a previous-generation model — e.g. from Opus 4.8 to a previous Opus or Sonnet. Older models aren’t affected by this; switch back to the newer model once there’s an official fix.

If all you wanted was the fix, you can stop here. The rest is how these conclusions were reached.

The symptom and the first suspicion

I was running a long task in Claude Code (Opus 4.8 1M context, alwaysThinkingEnabled: true + effortLevel: high) and kept hitting this error. The pattern was clear:

It triggered more often after longer thinking
The automatic retry failed too — retrying the same context produced the same error
The task was hard-blocked and couldn’t continue

My first guess was the network: SSE streaming uses a long-lived connection, and if it gets reset mid-stream, the tool-call JSON could arrive half-finished and fail to parse. But that was only a hypothesis — I needed data.

How to localize it: capture the raw SSE

To tell “network truncation” apart from a “model-side issue”, the reliable signal is the raw API response bytes Claude Code actually received. Anthropic’s /v1/messages is an SSE streaming response, and the key question is how that stream ends.

I put mitmproxy between Claude Code and the outbound path as a passive tap, recording every response without rewriting any request/response headers and preserving streaming (so as not to perturb the very thing being diagnosed):

Claude Code ──HTTPS_PROXY=8080──▶ mitmproxy(8080) ──▶ upstream ──▶ api.anthropic.com
                                      │
                                      └─ passive tee of raw bytes → logs/*.sse

Two things make Claude Code route through mitmproxy:

Point https_proxy at mitmproxy (http://127.0.0.1:8080) in the env block of ~/.claude/settings.json
Add mitmproxy’s CA certificate to NODE_EXTRA_CA_CERTS (Claude Code is a Node app and uses this to trust mitmproxy’s TLS interception)

{
  "env": {
    "https_proxy": "http://127.0.0.1:8080",
    "HTTPS_PROXY": "http://127.0.0.1:8080",
    "NODE_EXTRA_CA_CERTS": "/Users/you/.mitmproxy/mitmproxy-ca-cert.pem"
  }
}

Note: the env block is read at session startup. Editing the file mid-session won’t affect a session that’s already running.

When mitmproxy streams a response through, it hands you the content-encoding-compressed raw bytes (gzip here), so decompress with zlib.decompressobj(31) before reading.

The key evidence: the `usage` field

After reproducing the error, I decompressed the corresponding /v1/messages response. The event sequence looked like this:

message_start        model=claude-opus-4-8
content_block_start  index=0  type=thinking      ← only a thinking block
content_block_delta  type=thinking_delta   × 31  ← thinking all the way through
content_block_delta  type=signature_delta        ← the thinking block's signature
content_block_stop
message_delta        stop_reason=tool_use         ← the model signals it wants a tool
message_stop                                       ← and then it just ends

Notice: stop_reason is tool_use, but there is no content_block_start type=tool_use event anywhere in the stream. The model expressed the intent to call a tool, but the tool-use block itself was never emitted.

Why? The answer is in the usage field of message_delta:

{
  "type": "message_delta",
  "delta": { "stop_reason": "tool_use" },
  "usage": {
    "output_tokens": 3165,
    "output_tokens_details": { "thinking_tokens": 3120 },
    "input_tokens": 37,
    "cache_creation_input_tokens": 49060,
    "cache_read_input_tokens": 0
  }
}

Doing the math:

Item	Value
`output_tokens` (total output this response)	3165
of which `thinking_tokens` (spent on thinking)	3120
left for actual content (including the tool call)	45

A minimal tool-use block needs ~30–40 tokens just for the content_block_start envelope (id / name / empty input), plus the input JSON on top. 45 tokens is not enough to hold a complete tool call.

So the API emitted stop_reason=tool_use (the model’s decision) but had no tokens left to emit the corresponding tool_use content block. Claude Code received a response that “says it wants a tool but contains no tool call”, and reported tool call could not be parsed.

This also clears the network of suspicion: this stream’s HTTP body ended cleanly (message_stop received in full, no connection reset), and the dozens of other captured calls were all normal. The problem is in how the token budget was allocated.

Why “longer thinking makes it worse”

The causal chain:

effortLevel sets the ceiling of the thinking budget
alwaysThinkingEnabled: true forces thinking on every turn
In a large context (here input + cache ≈ 49K tokens), the model tends to think deeply and not stop
Thinking eats most of the output-token budget, leaving too little for the tool-use block, which can’t be emitted
The retry fails too, because the identical context reproduces the same exhaustion every time — a hallmark of a deterministic failure, which is itself evidence against random network flakiness

Takeaways

tool call could not be parsed is usually not a network problem but a model-side behavior. On the official api.anthropic.com, it’s the output tokens being exhausted by thinking.
This is a model-side issue that only reproduces in certain regions (the JP region seems especially affected) — not your local config or network being broken.
“Retry also failed” is a key clue — a deterministic failure points to the context/model side; random failures are more likely the network.
Capturing the raw SSE is an effective way to localize it: mitmproxy passive capture + decompress + read usage, and you can quickly tell which layer the problem is in.
Large context + always thinking is a failure-prone combination: the larger the context, the more the model leans into deep thinking and the easier it is to run the output budget dry. Turning off always thinking and lowering effort are mitigations; temporarily switching back to an older model is the most reliable workaround until there’s an official fix.

If this saved you some debugging time, feel free to pass it along to anyone hitting the same error.