Ollama v0.15.5-rc4 fixes off-by-one error in token prediction limits

Off-by-One Error in Token Prediction Fixed

Ollama v0.15.5-rc4 addresses a bug in the numPredict parameter that affected token generation limits.

The Problem

When numPredict was set to limit the number of generated tokens:

Users received one fewer token than requested
Token statistics incorrectly reported the number of tokens returned as the limit itself
This issue did not occur when numPredict was unset

Root Cause

The bug occurred because the limit was being checked during batch setup rather than during actual token prediction. This caused the current batch to terminate prematurely when the limit was reached.

The Fix

The fix moves the limit check to occur at the point of actual token prediction, ensuring accurate token counts and proper enforcement of the requested limit.

Off-by-One Error in Token Prediction Fixed

The Problem

Root Cause

The Fix

Tags

Published

Source