Falcon-H1 Hybrid Architecture Integration
NVIDIA Megatron Core now supports the Falcon-H1 parallel hybrid architecture, developed by Technology Innovation Institute (TII). Unlike sequential hybrid approaches, Falcon-H1 runs Transformer-based attention and Mamba-2 state-space model (SSM) components in parallel within each processing block, concatenating their outputs before projection. This design combines the long-context memory efficiency of SSMs with the long-range dependency modeling of attention mechanisms.
The integration spans two repositories: Megatron Core provides foundational components including the ParallelHybridLayer, updated layer allocation logic, and checkpoint conversion tools. Megatron Bridge builds complete Falcon-H1 model implementations with the FalconH1Layer, bidirectional Hugging Face weight conversion, and model providers for 0.5B, 1.5B-Deep, 7B, and 34B variants.
Architecture Flexibility
A key strength of this implementation is configurability. The ratio of parallel hybrid layers, pure Mamba layers, attention-only layers, and MLP-only layers can be independently adjusted, enabling flexible architecture exploration. The implementation includes non-learnable maximal update parametrization (µP) multipliers for stable and efficient training across heterogeneous components.
BitNet Ternary Quantization Integration
Megatron Core now integrates BitNet, enabling ternary (1.58-bit) quantized weight training for Falcon Edge models. This replaces standard linear layers with BitNetColumnParallelLinear and BitNetRowParallelLinear implementations using optimized Triton kernels.
Key benefits:
- Maintains tensor and pipeline parallelism across distributed training
- Reduces memory and bandwidth usage during training
- Preserves model throughput despite extreme quantization
- Enables seamless checkpoint conversion between Hugging Face and Megatron formats
Getting Started
Developers can now extend Megatron Core with custom model architectures by following the Falcon-H1 and BitNet integration patterns. All code is available in the GitHub-first NVIDIA/Megatron-LM repository and Megatron Bridge, shaped by contributions from foundation model builders in the community.