← Back
Ollama v0.15.5-rc3 fixes Qwen3 delta net tensor broadcasting issue
· bugfixrelease · github.com ↗

Qwen3 Delta Network Broadcasting Fix

Ollama v0.15.5-rc3 addresses a bug in the Qwen3 model implementation affecting the delta network layer's backward pass computation.

The Issue

The gDiffExp gradient tensor was being broadcast across the wrong axis during multiplication with the key tensor (k), causing incorrect gradient computation in the model's attention mechanism.

The Fix

The fix reshapes gDiffExp to the correct dimensions of [1, chunkSize, nChunks, ...] to ensure proper tensor alignment and broadcasting during the multiplication operation. This ensures gradients are computed correctly during backpropagation.

Impact

This bugfix is particularly important for users training or fine-tuning Qwen3 models with Ollama, as incorrect gradient computation would lead to degraded model performance and unreliable training results.