← Back
NVIDIA
NVIDIA Megatron Core adds Falcon-H1 hybrid architecture and BitNet quantization support
· releasefeatureapisdkplatformintegrationopen-source · developer.nvidia.com ↗

Falcon-H1 Hybrid Architecture Integration

NVIDIA Megatron Core now supports the Falcon-H1 parallel hybrid architecture, developed by Technology Innovation Institute (TII). Unlike sequential hybrid approaches, Falcon-H1 runs Transformer-based attention and Mamba-2 state-space model (SSM) components in parallel within each processing block, concatenating their outputs before projection. This design combines the long-context memory efficiency of SSMs with the long-range dependency modeling of attention mechanisms.

The integration spans two repositories: Megatron Core provides foundational components including the ParallelHybridLayer, updated layer allocation logic, and checkpoint conversion tools. Megatron Bridge builds complete Falcon-H1 model implementations with the FalconH1Layer, bidirectional Hugging Face weight conversion, and model providers for 0.5B, 1.5B-Deep, 7B, and 34B variants.

Architecture Flexibility

A key strength of this implementation is configurability. The ratio of parallel hybrid layers, pure Mamba layers, attention-only layers, and MLP-only layers can be independently adjusted, enabling flexible architecture exploration. The implementation includes non-learnable maximal update parametrization (µP) multipliers for stable and efficient training across heterogeneous components.

BitNet Ternary Quantization Integration

Megatron Core now integrates BitNet, enabling ternary (1.58-bit) quantized weight training for Falcon Edge models. This replaces standard linear layers with BitNetColumnParallelLinear and BitNetRowParallelLinear implementations using optimized Triton kernels.

Key benefits:

  • Maintains tensor and pipeline parallelism across distributed training
  • Reduces memory and bandwidth usage during training
  • Preserves model throughput despite extreme quantization
  • Enables seamless checkpoint conversion between Hugging Face and Megatron formats

Getting Started

Developers can now extend Megatron Core with custom model architectures by following the Falcon-H1 and BitNet integration patterns. All code is available in the GitHub-first NVIDIA/Megatron-LM repository and Megatron Bridge, shaped by contributions from foundation model builders in the community.