CasFormer: Cascaded Transformer Based on Dynamic Voxel Pyramid for 3D Object Detection from Point Clouds

Xinglong Li,Xiaowei Zhang

PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III(2024)

引用 0|浏览0
暂无评分
摘要
Recently, Transformers have been widely applied in 3-D object detection to model global contextual relationships in point cloud collections or for proposal refinement. However, the structural information in 3-D point clouds, especially to the distant and small objects is often incomplete, leading to difficulties in accurate detection using these methods. To address this issue, we propose a Cascaded Transformer based on Dynamic Voxel Pyramid (called CasFormer) for 3-D object detection from LiDAR point clouds. Specifically, we dynamically spread relevant features from the voxel pyramid based on the sparsity of each region of interest (RoI), capturing more rich semantic information for structurally incomplete objects. Furthermore, a cross-stage attentionmechanism is employed to cascade the refined results of theTransformer in stage by stage, aswell as to improve the training convergence of transformer. Extensive experiments demonstrate that our CasFormer achieves progressive performance in KITTI Dataset andWaymo Open Dataset. Compared to CT3D, our method outperforms it by 1.12% and 1.27% in the moderate and hard levels of car detection, respectively, on the KITTI online 3-D object detection leaderboard.
更多
查看译文
关键词
3-D object detection,Point clouds,Cascaded network
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要