incorrect
Non-thinking mode doesn’t “think immediately”, but it cannot go back from mistakes. These are still very capable and and have pretty low hallucination rates.
Thinking mode does NOT have a “validity checker” loop. It just thinks in one stream, that’s all
In many models, the model doesn’t even draft the response in the thinking block!
The model can in some cases even overthink / constrain itself, leading to worse outputs.
There is also no sort of cost / quality weighing either except a token count cap.
Note that thinking models suffer from the bad token problem as well, both in their CoT and final response:
