A joint framework for mining discriminative and frequent visual representation

Neurocomputing(2022)

引用 0|浏览14
暂无评分
摘要
Discovering visual representation in an image category is a challenging issue, because the visual representation should not only be discriminative but also frequently appears in these images. Previous studies have proposed many solutions, however, all of them separately optimized the discrimination and frequency, which consequently makes the solutions sub-optimal. We propose a method to discover the jointly discriminative and frequent visual representation to address this issue, named as JDFR. To ensure discrimination, JDFR employs a classification task with cross-entropy loss. To achieve frequency, we design a novel similarity concentration (SC) loss to concentrate on the samples with the same representation and pull them closer in the feature space, and then mine the frequent visual representations. Moreover, we utilize an attention module to locate the representative region in the image. Extensive experiments on five benchmark datasets (Place365-20, Travel, VOC2012-10, ImageNet-100, and iNaturalist-100) show that the discovered visual representations have better discrimination and frequency than ones mined by the state-of-the-art (SOTA) method with average improvements of 5.37% on accuracy and 3.06% on frequency.
更多
查看译文
关键词
Visual representation,Discrimination,Frequency
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要