BATON: Aligning Text-to-Audio Model with Human Preference Feedback

Huan Liao,Haonan Han, Kai Yang, Tianjiao Du,Rui Yang, Qinmei Xu,Zunnan Xu, Jingquan Liu, Jiasheng Lu,Xiu Li

IJCAI 2024（2024）

引用 0|浏览16

暂无评分

摘要

With the development of AI-Generated Content (AIGC), text-to-audio models are gaining widespread attention. However, it is challenging for these models to generate audio aligned with human preference due to the inherent information density of natural language and limited model understanding ability. To alleviate this issue, we formulate the BATON, the first framework specifically designed to enhance the alignment between generated audio and text prompt using human preference feedback. Our BATON comprises three key stages: Firstly, we curated a dataset containing both prompts and the corresponding generated audio, which was then annotated based on human feedback. Secondly, we introduced a reward model using the constructed dataset, which can mimic human preference by assigning rewards to input text-audio pairs. Finally, we employed the reward model to fine-tune an off-the-shelf text-to-audio model. The experiment results demonstrate that our BATON can significantly improve the generation quality of the original text-to-audio models, concerning audio integrity, temporal relationship, and alignment with human preference. Project page is available at https://baton2024.github.io.

查看译文

关键词

Machine Learning -> ML: Generative models,Multidisciplinary Topics and Applications -> MTA: Arts and creativity,Natural Language Processing -> NLP: Speech

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要