Qwen3 Delta Network Broadcasting Fix
Ollama v0.15.5-rc3 addresses a bug in the Qwen3 model implementation affecting the delta network layer's backward pass computation.
The Issue
The gDiffExp gradient tensor was being broadcast across the wrong axis during multiplication with the key tensor (k), causing incorrect gradient computation in the model's attention mechanism.
The Fix
The fix reshapes gDiffExp to the correct dimensions of [1, chunkSize, nChunks, ...] to ensure proper tensor alignment and broadcasting during the multiplication operation. This ensures gradients are computed correctly during backpropagation.
Impact
This bugfix is particularly important for users training or fine-tuning Qwen3 models with Ollama, as incorrect gradient computation would lead to degraded model performance and unreliable training results.