SELF-[IN]CORRECT: LLMs Struggle with Refining Self-Generated Responses
arxiv(2024)
摘要
Can LLMs continually improve their previous outputs for better results? An
affirmative answer would require LLMs to be better at discriminating among
previously-generated alternatives, than generating initial responses. We
explore the validity of this hypothesis in practice. We first introduce a
unified framework that allows us to compare the generative and discriminative
capability of any model on any task. Then, in our resulting experimental
analysis of several LLMs, we do not observe the performance of those models on
discrimination to be reliably better than generation. We hope these findings
inform the growing literature on self-improvement AI systems.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要