NVIDIA Megatron Core adds Falcon-H1 hybrid architecture and BitNet quantization support

Falcon-H1 Hybrid Architecture Integration

NVIDIA Megatron Core now supports the Falcon-H1 parallel hybrid architecture, developed by Technology Innovation Institute (TII). Unlike sequential hybrid approaches, Falcon-H1 runs Transformer-based attention and Mamba-2 state-space model (SSM) components in parallel within each processing block, concatenating their outputs before projection. This design combines the long-context memory efficiency of SSMs with the long-range dependency modeling of attention mechanisms.

The integration spans two repositories: Megatron Core provides foundational components including the ParallelHybridLayer, updated layer allocation logic, and checkpoint conversion tools. Megatron Bridge builds complete Falcon-H1 model implementations with the FalconH1Layer, bidirectional Hugging Face weight conversion, and model providers for 0.5B, 1.5B-Deep, 7B, and 34B variants.

Architecture Flexibility

A key strength of this implementation is configurability. The ratio of parallel hybrid layers, pure Mamba layers, attention-only layers, and MLP-only layers can be independently adjusted, enabling flexible architecture exploration. The implementation includes non-learnable maximal update parametrization (µP) multipliers for stable and efficient training across heterogeneous components.

BitNet Ternary Quantization Integration

Megatron Core now integrates BitNet, enabling ternary (1.58-bit) quantized weight training for Falcon Edge models. This replaces standard linear layers with BitNetColumnParallelLinear and BitNetRowParallelLinear implementations using optimized Triton kernels.

Key benefits:

Maintains tensor and pipeline parallelism across distributed training
Reduces memory and bandwidth usage during training
Preserves model throughput despite extreme quantization
Enables seamless checkpoint conversion between Hugging Face and Megatron formats

Getting Started

Developers can now extend Megatron Core with custom model architectures by following the Falcon-H1 and BitNet integration patterns. All code is available in the GitHub-first NVIDIA/Megatron-LM repository and Megatron Bridge, shaped by contributions from foundation model builders in the community.

Falcon-H1 Hybrid Architecture Integration

Architecture Flexibility

BitNet Ternary Quantization Integration

Getting Started

Tags

Published

Source