Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels
CVPR 2024(2024)
摘要
This paper focuses on open-ended video question answering, which aims to find
the correct answers from a large answer set in response to a video-related
question. This is essentially a multi-label classification task, since a
question may have multiple answers. However, due to annotation costs, the
labels in existing benchmarks are always extremely insufficient, typically one
answer per question. As a result, existing works tend to directly treat all the
unlabeled answers as negative labels, leading to limited ability for
generalization. In this work, we introduce a simple yet effective ranking
distillation framework (RADI) to mitigate this problem without additional
manual annotation. RADI employs a teacher model trained with incomplete labels
to generate rankings for potential answers, which contain rich knowledge about
label priority as well as label-associated visual cues, thereby enriching the
insufficient labeling information. To avoid overconfidence in the imperfect
teacher model, we further present two robust and parameter-free ranking
distillation approaches: a pairwise approach which introduces adaptive soft
margins to dynamically refine the optimization constraints on various pairwise
rankings, and a listwise approach which adopts sampling-based partial listwise
learning to resist the bias in teacher ranking. Extensive experiments on five
popular benchmarks consistently show that both our pairwise and listwise RADIs
outperform state-of-the-art methods. Further analysis demonstrates the
effectiveness of our methods on the insufficient labeling problem.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要