What's New
AMD and the Miles team have announced full ROCm support for Miles, an open-source reinforcement learning framework designed for large-scale post-training of foundation models. The framework now runs natively on AMD Instinct MI GPUs, including MI300X, MI350X, and MI355X accelerators, with production-ready Docker containers and end-to-end validation.
Why This Matters
Reinforcement learning workloads differ fundamentally from pretraining: rollout generation (creating training data through parallel inference) dominates compute, accounting for 70–90% of GPU time. This makes memory capacity and bandwidth critical performance factors—areas where AMD Instinct GPUs excel with large HBM capacity, high memory bandwidth, and efficient long-context inference capabilities.
What's Supported
Miles provides a decoupled two-plane RL architecture:
- Rollout plane: generates training data using SGLang for distributed inference
- Training plane: updates model weights using Megatron-LM for distributed training
- Scheduler layer: coordinates interaction between the two planes
Current feature support on ROCm includes GRPO training, model and data parallelism, Ray-based orchestration, and integration with both Megatron-LM and SGLang backends.
Getting Started
AMD provides GPU-specific prebuilt containers to minimize setup:
- MI300X:
rlsys/miles:rocm7-MI300-sglang0.5.9-latest - MI350X/MI355X:
rlsys/miles:rocm7-MI350-355-sglang0.5.9-latest
Users can pull the appropriate container, install Miles from GitHub, download model/dataset assets from Hugging Face, and launch the full RL pipeline with a single bash script.
Performance Results
Experiments on a single 8-GPU MI300X node training Qwen3-30B with GRPO achieved:
- Rollout throughput: 1.1k–1.3k tokens/GPU/second
- Training throughput: ~15–16k tokens/second
- Mean step time: 388.50 seconds (152.79s rollout, 95.30s training, 33.85s weight updates)
On a multi-turn math reasoning task (AIME), accuracy improved from 66.5% to 72.9% over 139 training steps, demonstrating effective RL post-training convergence.
Next Steps
The roadmap indicates additional features are in development for AMD. Developers can immediately start using Miles on AMD GPUs for RL post-training workflows by pulling the provided containers and following the documented setup instructions.