Multi-Modal Domain Distribution Dilation For Text-Based Person Retrieval

2023 China Automation Congress (CAC)(2023)

引用 0|浏览5
暂无评分
摘要
The goal of text-based person retrieval is to recognize a corresponding target person from a mass of person image dataset according to a provided text query. Many previous methods are faced with a limited domain distribution (LD2) dilemma. To solve this tough problem, we propose a novel Multi-modal Domain Distribution Dilation (MD 3 ) framework for text-based person retrieval. MD3 consists of two streams, namely an original distribution stream (ODS) and a dilated distribution stream (DDS). A Visual Distribution Dilating (VDD) module is proposed to perturb the key attributes (such as brightness, contrast, and saturation) of an input raw image. A Textual Distribution Dilating (TDD) module is also adopted to make a variation on the textual domain distribution. In order to achieve the purpose of adapting to various domain distribution in a reasonable and effective way, we adopt a mutual learning mechanism that facilitates communication and learning between two streams with diverse distribution information. We carried out a large number of experiments on the widely-used CUHK-PEDES, RSTPReid and ICFG-PEDES datasets to verify the effectiveness of MD3. Compared with the existing methods, MD3 is superior and has achieved the state-of-the-art performance.
更多
查看译文
关键词
text-based person retrieval,person reidentification,cross-modal retrieval,color domain distribution,mutual learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要