Multimodal latent factor model with language constraint for predicate detection

Xuan Ma,Bing-Kun Bao,Lingling Yao,Changsheng Xu

ICIP（2019）

引用 0|浏览0

暂无评分

摘要

Nowadays, visual relationship detection has shown an important utility in scene understanding. Predicate detection, which aims to detect the predicate between entities in an image, is an important part of visual relationship detection. In this paper, we propose Multimodal Latent Factor Model with Language Constraint (MMLFM-LC) for predicate detection with the novelty of integrating knowledge learned from multiple modalities, valid relationships and semantical similarities. Representations of visual and textual modalities are firstly input into the constructed model. Secondly, a bilinear structure is introduced to model the relationships using valid relationships, while a language constraint is also built utilizing semantical similarities. Lastly, visual and textual representations are fused in an embedded subspace for predicate detection. Experiments on both Visual Relationship and Visual Genome datasets show that our method outperforms other methods on predicate detection.

查看译文

关键词

Predicate representation, Multimodal fusion, Valid relationships, Semantical similarities

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要