Hey, a lot of developers assume that temperature = 0 means deterministic but here is the truth,
Most developers assume temperature =0 will give them identical output every time , but this is not guaranteed , temp=0 means models greedily selects the highest probability token at each step , but there might be several reasons for this
-
Floating point non determinism
-
Parallel processing and batch effects
-
Model update and shadow deployments
-
Contact window position effects
-
Top -P interaction
-
Tokenisation sensitivity
-
Long output instability
-
Instruction ambiguity at decision boundaries
The bottom line is temp=0 is good start but not a complete solution. The only true robust approach for this would be treating LLM output as a untrusted input that must be validated, the same way you’d validate an external API response