Ollama v0.15.5-rc4 fixes off-by-one error in token prediction limits
Off-by-One Error in Token Prediction Fixed
Ollama v0.15.5-rc4 addresses a bug in the numPredict parameter that affected token generation limits.
The Problem
When numPredict was set to limit the number of generated tokens:
- Users received one fewer token than requested
- Token statistics incorrectly reported the number of tokens returned as the limit itself
- This issue did not occur when
numPredictwas unset
Root Cause
The bug occurred because the limit was being checked during batch setup rather than during actual token prediction. This caused the current batch to terminate prematurely when the limit was reached.
The Fix
The fix moves the limit check to occur at the point of actual token prediction, ensuring accurate token counts and proper enforcement of the requested limit.