MVITP: Multi-View Image-Text Perception for Few-Shot Remote Sensing Image Classification

Chen Yang, Tongtong Liu, Didi Jiao,Wenhui Li

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2024)

引用 0|浏览3
暂无评分
摘要
Few-shot learning has been extensively applied in current remote sensing image classification, enabling rapid identification of new classes by leveraging prior knowledge effectively. However, current methods mainly rely on image modality to address the issue of low intra-class similarity and high interclass similarity, while the utilization of multimodal methods in remote sensing tasks remains largely unexplored. Therefore, we propose a novel framework for few-shot remote sensing image classification, named multi-view image-text perception (MVITP). Specifically, it leverages maximum mutual information across multiple views to train an image encoder and generate image features. A text encoder is employed to generate text features. Next, we introduce a multimodal fusion encoder to capture the similarity between image features and text features. Finally, class predictions are further made by computing the similarity between the support set and the query set. We conduct experiments on three remote sensing datasets, demonstrating the outstanding performance of MVITP.
更多
查看译文
关键词
Maximum mutual information,multimodal fusion,few-shot learning,remote sensing image classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要