The Perfect Blend: Redefining RLHF with Mixture of Judges
Tengyu Xu, Eryk Helenowski,Karthik Abinav Sankararaman, Di Jin, Kaiyan Peng, Eric Han,Shaoliang Nie, Chen Zhu, Hejia Zhang, Wenxuan Zhou, Zhouhao Zeng, Yun He, Karishma Mandyam, Arya Talabzadeh,Madian Khabsa, Gabriel Cohen, Yuandong Tian,Hao Ma,Sinong Wang, Han Fang arxiv(2024)
AI 理解论文
溯源树
样例