Ollama v0.18.3-rc0 adds support for MXFP4/MXFP8/NVFP4 quantization formats

MXFP Format Support Added

Ollama v0.18.3-rc0 introduces support for three new floating-point quantization formats: MXFP4, MXFP8, and NVFP4. These formats represent optimizations for model inference, enabling more efficient storage and computation.

What Changed

BF16 conversion: Models in bfloat16 format can now be imported and converted to MXFP4 or MXFP8 quantization
FP8 direct conversion: Floating-point 8-bit models can be directly converted to MXFP8 format
MLX framework support: Changes are integrated into Ollama's MLX-based inference engine

For Users

This release candidate allows developers and users to work with more efficient quantization formats when importing and running models. These formats can reduce model size while maintaining reasonable inference quality, useful for resource-constrained environments and faster deployment.

Next Steps

Test this release candidate if you're working with quantized models or planning to use MXFP quantization formats. Report any issues on the Ollama GitHub repository to help stabilize v0.18.3 for general release.

MXFP Format Support Added

What Changed

For Users

Next Steps

Tags

Published

Source