2DSegFormer: 2-D Transformer Model for Semantic Segmentation on Aerial Images

IEEE Transactions on Geoscience and Remote Sensing(2022)

引用 6|浏览1
暂无评分
摘要
Two-dimensional position information of input tokens is essential for transformer-based semantic segmentation models, especially on high-resolution aerial images. However, recent transformer-based segmentation methods use position encoding to record position information and most position encoding methods encode the 1-D positions of tokens. Therefore, we propose a 2-D semantic transformer model (2DSegFormer) for semantic segmentation on aerial images. In 2DSegFormer, we design a novel 2-D positional attention to accurately record the 2-D position information required by the transformer. Furthermore, we design the dilated residual connection and use it instead of skip connection in the deep stages to get a larger receptive field. Skip connections are used in the shallow stages of 2DSegFormer to pass the details to the corresponding stages in the decoder. Experimental results on UAVid, Vaihingen, and AeroScapes datasets demonstrate the effectiveness of 2DSegFormer. Compared with the state-of-the-art methods, 2DSegFormer shows better performance and great robustness on three different datasets. In particular, 2DSegFormer-B2 achieves first place in the public ranking on the UAVid test set.
更多
查看译文
关键词
2-D positional attention,aerial images,semantic segmentation,transformer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要