ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation
CoRR(2024)
摘要
Reinforcement Learning from Human Feedback (RLHF) stands as a pivotal
technique in empowering large language model (LLM) applications. Since RLHF
involves diverse computational workloads and intricate dependencies among
multiple LLMs, directly adopting parallelization techniques from supervised
training can result in sub-optimal performance. To overcome this limitation, we
propose a novel approach named parameter ReaLlocation, which dynamically
redistributes LLM parameters in the cluster and adapts parallelization
strategies during training. Building upon this idea, we introduce ReaLHF, a
pioneering system capable of automatically discovering and running efficient
execution plans for RLHF training given the desired algorithmic and hardware
configurations. ReaLHF formulates the execution plan for RLHF as an augmented
dataflow graph. Based on this formulation, ReaLHF employs a tailored search
algorithm with a lightweight cost estimator to discover an efficient execution
plan. Subsequently, the runtime engine deploys the selected plan by effectively
parallelizing computations and redistributing parameters. We evaluate ReaLHF on
the LLaMA-2 models with up to 4×70 billion parameters and 128 GPUs. The
experiment results showcase ReaLHF's substantial speedups of 2.0-10.6×
compared to baselines. Furthermore, the execution plans generated by ReaLHF
exhibit an average of 26% performance improvement over heuristic approaches
based on Megatron-LM. The source code of ReaLHF is publicly available at
https://github.com/openpsi-project/ReaLHF .
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要