A Contour-Aware Monocular Depth Estimation Network using Swin Transformer and Cascaded Multi-scale Fusion

IEEE Sensors Journal(2024)

引用 0|浏览1
暂无评分
摘要
Depth estimation from monocular vision sensor is a fundamental problem in scene perception with wide industrial applications. Previous works tend to predict the scene depth based on high-level features obtained by convolution neural networks (CNNs), or rely on encoder-decoder frameworks of Transformers. However, they achieved less satisfactory results especially around object contours. In this paper, we propose a Transformer based Contour-Aware Depth Estimation module to recover the scene depth with the aid of the enhanced perception of object contours. Besides, we develop a cascaded multi-scale fusion module to aggregate multi-level features, where we combine the global context with local information and refine the depth map to a higher resolution from coarse to fine. Finally, we model depth estimation as a classification problem and discretize the depth value in an adaptive way to further improve the performance of our network. Extensive experiments have been conducted on mainstream public datasets (KITTI and NYUv2) to demonstrate the effectiveness of our network, where our network exhibits superior performance against other state-of-the-art methods.
更多
查看译文
关键词
Monocular Depth estimation,Contour Aware,Swin Transformer,Cascaded Multi-scale Fusion
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要