← Back
LMSYS
AMD adds ROCm support for Miles RL framework on Instinct GPUs
· featureplatformintegrationreleasesdk · lmsys.org ↗

What's New

AMD and the Miles team have announced full ROCm support for Miles, an open-source reinforcement learning framework designed for large-scale post-training of foundation models. The framework now runs natively on AMD Instinct MI GPUs, including MI300X, MI350X, and MI355X accelerators, with production-ready Docker containers and end-to-end validation.

Why This Matters

Reinforcement learning workloads differ fundamentally from pretraining: rollout generation (creating training data through parallel inference) dominates compute, accounting for 70–90% of GPU time. This makes memory capacity and bandwidth critical performance factors—areas where AMD Instinct GPUs excel with large HBM capacity, high memory bandwidth, and efficient long-context inference capabilities.

What's Supported

Miles provides a decoupled two-plane RL architecture:

  • Rollout plane: generates training data using SGLang for distributed inference
  • Training plane: updates model weights using Megatron-LM for distributed training
  • Scheduler layer: coordinates interaction between the two planes

Current feature support on ROCm includes GRPO training, model and data parallelism, Ray-based orchestration, and integration with both Megatron-LM and SGLang backends.

Getting Started

AMD provides GPU-specific prebuilt containers to minimize setup:

  • MI300X: rlsys/miles:rocm7-MI300-sglang0.5.9-latest
  • MI350X/MI355X: rlsys/miles:rocm7-MI350-355-sglang0.5.9-latest

Users can pull the appropriate container, install Miles from GitHub, download model/dataset assets from Hugging Face, and launch the full RL pipeline with a single bash script.

Performance Results

Experiments on a single 8-GPU MI300X node training Qwen3-30B with GRPO achieved:

  • Rollout throughput: 1.1k–1.3k tokens/GPU/second
  • Training throughput: ~15–16k tokens/second
  • Mean step time: 388.50 seconds (152.79s rollout, 95.30s training, 33.85s weight updates)

On a multi-turn math reasoning task (AIME), accuracy improved from 66.5% to 72.9% over 139 training steps, demonstrating effective RL post-training convergence.

Next Steps

The roadmap indicates additional features are in development for AMD. Developers can immediately start using Miles on AMD GPUs for RL post-training workflows by pulling the provided containers and following the documented setup instructions.