Interpretability · Chain-of-Thought · Reasoning Models

Models make up their minds before they say so.

A reasoning model’s final answer often stabilizes in one sharp step, well before its chain-of-thought ends. What follows can look like deliberation without changing the answer.

Daniel Scalena · Sara Candussio · Luca Bortolussi · Elisabetta Fersini · Malvina Nissim · Gabriele Sarti

University of Groningen · University of Milano-Bicocca · University of Trieste · Northeastern University

Animated overview of the commitment boundary: final-answer confidence rises sharply, then a probe detects when reasoning can stop.
Reasoning moves from transient guesses to a stable final answer at the commitment boundary. A lightweight probe can detect that transition and stop generation early.

The main claim

Chain-of-thought is not uniformly causal. Across three model families and four reasoning benchmarks, we find a commitment boundary: the point where the model’s final answer suddenly reaches full-trace confidence.

Before the boundary, reasoning can genuinely change the answer. After it, models often keep hedging, checking, and explaining—even though those steps leave the elicited final answer essentially unchanged.

How we locate it

We stop a reasoning trace after each sentence and ask the model for its answer. This gives a causal, step-by-step view of when intermediate guesses are revised and when the final answer becomes stable.

1 · Truncate

Cut the chain-of-thought after every reasoning step.

2 · Elicit

Force an answer from each partial trace.

3 · Compare

Find the step where the full-trace answer sharply stabilizes.

4.6× larger median confidence jump at the boundary than the second-largest jump
>90% in-distribution boundary detection rate from model activations
3 × 4 model families × reasoning benchmarks evaluated
One trace, in miniature

A model searches for the smallest multiple of 30 written only with 0s and 2s.

Before It explores and rejects genuine intermediate guesses such as 20, 200, and 2200. Step 39 “So number is 2220?” Final-answer confidence jumps from 0.36 to 0.995. After It spends another 34 sentences checking the result, while the answer remains fixed at 2220.

Why it matters

A chain-of-thought can continue to sound uncertain after the model has already committed. Surface language alone is therefore an unreliable guide to whether genuine answer revision is still happening.

The boundary is also visible in model activations. Lightweight causal attention probes detect it across unseen reasoning tasks and can use it as an adaptive early-exit signal, skipping substantial post-commitment reasoning with little loss in answer accuracy.

Paper and citation

Beyond the Commitment Boundary: Probing Epiphenomenal Chain-of-Thought in Large Reasoning Models. 2026.

@article{scalena2026commitment,
  title  = {Beyond the Commitment Boundary: Probing Epiphenomenal
            Chain-of-Thought in Large Reasoning Models},
  author = {Scalena, Daniel and Candussio, Sara and Bortolussi, Luca
            and Fersini, Elisabetta and Nissim, Malvina and Sarti, Gabriele},
  year   = {2026}
}