Dual-Path Rare Content Enhancement Network for Image and Text Matching

Yan Wang,Yuting Su,Wenhui Li,Jun Xiao,Xuanya Li,An-An Liu

IEEE Transactions on Circuits and Systems for Video Technology（2023）

引用 0|浏览21

暂无评分

摘要

Image and text matching plays a crucial role in bridging the cross-modal gap between vision and language, and has achieved great progress due to the deep learning. However, the existing methods still suffer from the long-tail problem, where only a small proportion contains highly frequent semantics and a long tail proportion is constructed by rare semantics. In this paper, we propose a novel Dual-path Rare Content Enhancement Network (DRCE) to tackle the long-tail issue. Specifically, the Cross-modal Representation Enhancement (CRE) and Cross-modal Association Enhancement (CAE) are proposed to construct dual-path structure to enhance rare content representation and association with the benefit of cross-modal prior knowledge. This structure can effectively exploit the complementary cross-modal relation from different aspects and fuse these information in an adaptively manner by the proposed Adaptive Fusion Strategy (AFS). Moreover, we also propose an alternative re-ranking strategy (ARR) to explore the reciprocal contextual information to refine image-text matching results, which can further suppress the negative effect of long-tail effect. Extensive experiments on two large-scale datasets show the significant improvements and validate the superiority of our method.

查看译文

关键词

Rare content enhancement,long-tail effect,image and text matching

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要