Fixing "The model's tool call could not be parsed" in Claude Code
Claude Code Anthropic debugging mitmproxyIf you keep seeing this error in Claude Code — especially after the model has been “thinking” for a long time:
● The model's tool call could not be parsed (retry also failed).
✳ Churned for 5m 47s
This post gives you a fix you can apply right away, followed by the analysis of what actually causes it.
中文版:解决 Claude Code 报错 The model’s tool call could not be parsed
The fix, first
This error is usually not a network problem. It’s caused by extended thinking exhausting the output-token budget. The most direct fixes:
Option 1: Turn off always thinking (recommended)
Edit ~/.claude/settings.json:
{
"alwaysThinkingEnabled": false
}
Or use /config inside a Claude Code session and turn Always thinking off.
Start a new session for this to take effect —
settings.jsonis read once at session startup, so editing it mid-session has no effect on a session that’s already running.
Turning it off does not make the model worse: it still thinks when it decides thinking is needed; it just stops spending the token budget on thinking on every turn. After disabling it, I ran the same long tasks with effortLevel: high and even xhigh, and the error never came back.
Option 2: Lower the effort, or start a fresh context
- Drop
effortLevelfromhighto a lower tier (e.g.low) to cap the thinking budget - Or, when the current conversation context is already large, start a clean session to reduce input tokens and leave more room for output
If all you wanted was the fix, you can stop here. The rest is how this conclusion was reached.
The symptom and the first suspicion
I was running a long task in Claude Code (Opus 4.8 1M context, alwaysThinkingEnabled: true + effortLevel: high) and kept hitting this error. The pattern was clear:
- It triggered more often after longer thinking
- The automatic retry failed too — retrying the same context produced the same error
- The task was hard-blocked and couldn’t continue
My first guess was the network: SSE streaming uses a long-lived connection, and if it gets reset mid-stream, the tool-call JSON could arrive half-finished and fail to parse. But that was only a hypothesis — I needed data.
How to localize it: capture the raw SSE
To tell “network truncation” apart from a “model-side issue”, the reliable signal is the raw API response bytes Claude Code actually received. Anthropic’s /v1/messages is an SSE streaming response, and the key question is how that stream ends.
I put mitmproxy between Claude Code and the outbound path as a passive tap, recording every response without rewriting any request/response headers and preserving streaming (so as not to perturb the very thing being diagnosed):
Claude Code ──HTTPS_PROXY=8080──▶ mitmproxy(8080) ──▶ upstream ──▶ api.anthropic.com
│
└─ passive tee of raw bytes → logs/*.sse
Two things make Claude Code route through mitmproxy:
- Point
https_proxyat mitmproxy (http://127.0.0.1:8080) in theenvblock of~/.claude/settings.json - Add mitmproxy’s CA certificate to
NODE_EXTRA_CA_CERTS(Claude Code is a Node app and uses this to trust mitmproxy’s TLS interception)
{
"env": {
"https_proxy": "http://127.0.0.1:8080",
"HTTPS_PROXY": "http://127.0.0.1:8080",
"NODE_EXTRA_CA_CERTS": "/Users/you/.mitmproxy/mitmproxy-ca-cert.pem"
}
}
Note: the
envblock is read at session startup. Editing the file mid-session won’t affect a session that’s already running.
When mitmproxy streams a response through, it hands you the content-encoding-compressed raw bytes (gzip here), so decompress with zlib.decompressobj(31) before reading.
The key evidence: the usage field
After reproducing the error, I decompressed the corresponding /v1/messages response. The event sequence looked like this:
message_start model=claude-opus-4-8
content_block_start index=0 type=thinking ← only a thinking block
content_block_delta type=thinking_delta × 31 ← thinking all the way through
content_block_delta type=signature_delta ← the thinking block's signature
content_block_stop
message_delta stop_reason=tool_use ← the model signals it wants a tool
message_stop ← and then it just ends
Notice: stop_reason is tool_use, but there is no content_block_start type=tool_use event anywhere in the stream. The model expressed the intent to call a tool, but the tool-use block itself was never emitted.
Why? The answer is in the usage field of message_delta:
{
"type": "message_delta",
"delta": { "stop_reason": "tool_use" },
"usage": {
"output_tokens": 3165,
"output_tokens_details": { "thinking_tokens": 3120 },
"input_tokens": 37,
"cache_creation_input_tokens": 49060,
"cache_read_input_tokens": 0
}
}
Doing the math:
| Item | Value |
|---|---|
output_tokens (total output this response) | 3165 |
of which thinking_tokens (spent on thinking) | 3120 |
| left for actual content (including the tool call) | 45 |
A minimal tool-use block needs ~30–40 tokens just for the content_block_start envelope (id / name / empty input), plus the input JSON on top. 45 tokens is not enough to hold a complete tool call.
So the API emitted stop_reason=tool_use (the model’s decision) but had no tokens left to emit the corresponding tool_use content block. Claude Code received a response that “says it wants a tool but contains no tool call”, and reported tool call could not be parsed.
This also clears the network of suspicion: this stream’s HTTP body ended cleanly (message_stop received in full, no connection reset), and the dozens of other captured calls were all normal. The problem is in how the token budget was allocated.
Why “longer thinking makes it worse”
The causal chain:
effortLevelsets the ceiling of the thinking budgetalwaysThinkingEnabled: trueforces thinking on every turn- In a large context (here input + cache ≈ 49K tokens), the model tends to think deeply and not stop
- Thinking eats most of the output-token budget, leaving too little for the tool-use block, which can’t be emitted
- The retry fails too, because the identical context reproduces the same exhaustion every time — a hallmark of a deterministic failure, which is itself evidence against random network flakiness
Takeaways
tool call could not be parsedis usually not a network problem. On the officialapi.anthropic.com, it’s more likely the output tokens being exhausted by thinking.- “Retry also failed” is a key clue — a deterministic failure points to the context/model side; random failures are more likely the network.
- Capturing the raw SSE is an effective way to localize it: mitmproxy passive capture + decompress + read
usage, and you can quickly tell which layer the problem is in. - Large context + always thinking is a failure-prone combination: the larger the context, the more the model leans into deep thinking and the easier it is to run the output budget dry.
If this saved you some debugging time, feel free to pass it along to anyone hitting the same error.