Dual-Path Rare Content Enhancement Network for Image and Text Matching

IEEE Transactions on Circuits and Systems for Video Technology(2023)

引用 0|浏览21
暂无评分
摘要
Image and text matching plays a crucial role in bridging the cross-modal gap between vision and language, and has achieved great progress due to the deep learning. However, the existing methods still suffer from the long-tail problem, where only a small proportion contains highly frequent semantics and a long tail proportion is constructed by rare semantics. In this paper, we propose a novel Dual-path Rare Content Enhancement Network (DRCE) to tackle the long-tail issue. Specifically, the Cross-modal Representation Enhancement (CRE) and Cross-modal Association Enhancement (CAE) are proposed to construct dual-path structure to enhance rare content representation and association with the benefit of cross-modal prior knowledge. This structure can effectively exploit the complementary cross-modal relation from different aspects and fuse these information in an adaptively manner by the proposed Adaptive Fusion Strategy (AFS). Moreover, we also propose an alternative re-ranking strategy (ARR) to explore the reciprocal contextual information to refine image-text matching results, which can further suppress the negative effect of long-tail effect. Extensive experiments on two large-scale datasets show the significant improvements and validate the superiority of our method.
更多
查看译文
关键词
Rare content enhancement,long-tail effect,image and text matching
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要