RingMo-Lite: A Remote Sensing Lightweight Network With CNN-Transformer Hybrid Framework

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING(2024)

引用 0|浏览5
暂无评分
摘要
In recent years, remote sensing (RS) vision foundation models, such as RingMo, have emerged and achieved excellent performance in various downstream tasks. However, the high demand for computing resources limits the application of these models on edge devices. It is necessary to design a more lightweight foundation model to support on-orbit RS image interpretation. Existing methods face challenges in achieving lightweight solutions while retaining generalization in RS image interpretation. This is due to the complex high-frequency (H-F) and low-frequency (L-F) spectral components in RS images, which make traditional single convolutional neural network (CNN) or vision Transformer methods unsuitable for the task. Therefore, this article proposes RingMo-lite, an RS lightweight network with a CNN-Transformer hybrid framework, which effectively exploits the frequency-domain properties of RS to optimize the interpretation process on several tasks like classification, object detection, semantic segmentation, and change detection. It is combined by the Transformer module as a low-pass filter to extract global features of RS images through a dual-branch structure and the CNN module as a stacked high-pass filter to extract fine-grained details effectively. Furthermore, a novelty-designed frequency-domain masked image modeling (FD-MIM) is employed during the pretraining stage for self-supervised learning, which combines the H-F and L-F characteristics of each image patch. This approach effectively captures the latent feature representation in RS data. Compared with RingMo, the proposed RingMo-lite reduces the parameters by over 60% in various RS image interpretation tasks, and the average accuracy drops by less than 2% in most of the scenes and achieves state-of-the-art (SOTA) performance compared to models of similar size. In addition, our work will be integrated into the MindSpore computing platform in the near future.
更多
查看译文
关键词
Convolutional neural network (CNN)-Transformer hybrid framework,lightweight foundation model,masked image modeling (MIM),remote sensing (RS) frequency-domain features
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要