RL for massive LLMs faces update delays; optimizations like checkpoint engines reduce this to seconds.
For policy optimization: Update 1T params after rollout, sync across 100 GPUs in 15s.