An Explainable Spatial-Frequency Multiscale Transformer for Remote Sensing Scene Classification.

IEEE Trans. Geosci. Remote. Sens.(2023)

引用 3|浏览12
暂无评分
摘要
Deep convolutional neural networks (CNNs) are significant in remote sensing. Due to the strong local representation learning ability, CNNs have excellent performance in remote sensing scene classification. However, CNNs focus on location-sensitive representations in the spatial domain and lack contextual information mining capabilities. Meanwhile, remote sensing scene classification still faces challenges, such as complex scenes and significant differences in target sizes. To address the problems and challenges above, more robust feature representation learning networks are necessary. In this article, a novel and explainable spatial-frequency multiscale Transformer framework, SF-MSFormer, is proposed for remote sensing scene classification. It mainly comprises spatial-domain and frequency-domain multiscale Transformer branches, which consider the spatial-frequency global multiscale representation features. Besides, the texture-enhanced encoder is designed in the frequency-domain multiscale Transformer branch, which is adaptive to capture the global texture features. In addition, an adaptive feature aggregation module is designed to integrate the spatial-frequency multiscale feature for final recognition. The experimental results verify the effectiveness of SF-MSFormer and show better convergence. It achieves state-of-the-art results [98.72%, 98.6%, 99.72%, and 94.83% overall accuracies (OAs), respectively] on the AID, UCM, WHU-RS19, and NWPU-RESISC45 datasets. Besides, the feature visualizations evaluate the explainability of the texture-enhanced encoder.
更多
查看译文
关键词
remote sensing,classification,spatial-frequency,multi-scale
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要