Deal, or no deal (or who knows)? Forecasting Uncertainty in Conversations using Large Language Models
arxiv(2024)
摘要
Effective interlocutors account for the uncertain goals, beliefs, and
emotions of others. But even the best human conversationalist cannot perfectly
anticipate the trajectory of a dialogue. How well can language models represent
inherent uncertainty in conversations? We propose FortUne Dial, an expansion of
the long-standing "conversation forecasting" task: instead of just accuracy,
evaluation is conducted with uncertainty-aware metrics, effectively enabling
abstention on individual instances. We study two ways in which language models
potentially represent outcome uncertainty (internally, using scores and
directly, using tokens) and propose fine-tuning strategies to improve
calibration of both representations. Experiments on eight difficult negotiation
corpora demonstrate that our proposed fine-tuning strategies (a traditional
supervision strategy and an off-policy reinforcement learning strategy) can
calibrate smaller open-source models to compete with pre-trained models 10x
their size.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要