Iterative Learning with Extra and Inner Knowledge for Long-tail Dynamic Scene Graph Generation

MM '23: Proceedings of the 31st ACM International Conference on Multimedia（2023）

引用 0|浏览2

暂无评分

摘要

Dynamic scene graphs have become a powerful tool for higher-level visual understanding tasks, and the interest in dynamic scene graph generation (dynamic SGG) is grown over time. Recently, numbers of existing methods achieve significant progress in dynamic SGG by capturing temporal information with transformer or recurrent network structures. However, most existing methods only focus on predicting the head predicates, which ignore the long-tail phenomenon, thus the tail predicates are hard to be recognized. In this paper, we propose a novel method named Iterative Learning with Extra and Inner Knowledge (I2LEK) to address the long-tail problem in dynamic SGG. The extra knowledge is obtained from commonsense, while inner knowledge is defined as the temporal evolution patterns of visual relationships. Specifically, we introduce extra knowledge to enrich the representations of predicates in the spatial dimension and adopt inner knowledge to implement knowledge sharing in the temporal dimension. With enriched representations and shared knowledge, I2LEK can accurately predict both the tail and head predicates. Moreover, an iterative learning strategy is proposed to fuse the extra knowledge, inner knowledge, and spatial-temporal context contained in videos, which further enhances the model's understanding of visual relationships. Our experimental results on the public Action Genome dataset demonstrate that our model achieves state-of-the-art performance.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要