← Back
Transformers.js v4 Preview Debuts on NPM with New WebGPU Runtime and 10x Build Speed Gains
· releasefeaturesdkperformanceopen-source · huggingface.co ↗

Major Runtime Rewrite

Transformers.js v4 features a completely rebuilt WebGPU runtime written in C++ and developed in collaboration with the ONNX Runtime team. This new foundation enables the same transformer inference code to run across diverse JavaScript environments—browsers, Node.js, Bun, and Deno—all with GPU acceleration support.

Performance Optimizations

The v4 release delivers significant performance gains through strategic use of ONNX Runtime's Contrib Operators. By leveraging specialized operators like com.microsoft.GroupQueryAttention, com.microsoft.MatMulNBits, and com.microsoft.MultiHeadAttention, the team achieved ~4x speedups for BERT-based embedding models. The build system migration from Webpack to esbuild reduced build times from 2 seconds to 200 milliseconds (10x improvement) and decreased average bundle sizes by 10%.

Expanded Model Support

v4 introduces support for new model architectures including GPT-OSS, Chatterbox, GraniteMoeHybrid, HunYuanDenseV1, Olmo3, and Youtu-LLM. The release adds first-class support for advanced patterns like Mamba (state-space models), Multi-head Latent Attention (MLA), and Mixture of Experts (MoE), bringing the total supported architectures to ~200.

Code Quality and Maintainability

The development cycle included major refactoring efforts: conversion to a pnpm monorepo workspace for modular sub-packages, splitting the 8,000+ line models.js file into focused modules, moving examples to a dedicated repository, and adopting Prettier for consistent formatting. These changes reduce technical debt and lower the barrier for contributors adding new models.

Getting Started

Install the preview version with:

npm i @huggingface/transformers@next

The team will continue publishing regular v4 updates under the next tag on NPM before the full release.