Preference Optimization As Probabilistic Inference
Abbas Abdolmaleki,Bilal Piot,Bobak Shahriari,Jost Tobias Springenberg,Tim Hertweck,Rishabh Joshi,Junhyuk Oh,Michael Bloesch,Thomas Lampe,Nicolas Heess,Jonas Buchli,Martin Riedmiller arxiv(2024)
AI 理解论文
溯源树
样例