Spatial and temporal consistency learning for monocular 6D pose estimation

Hong-Bo Zhang, Jia-Yu Liang, Jia-Xin Hong,Qing Lei,Jing-Hua Liu,Ji-Xiang Du

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE(2024)

引用 0|浏览6
暂无评分
摘要
Monocular 6D pose estimation is a challenging task in the field of computer vision and robotics. Many previous works only input the cropped image of single object during training and inference, aiming to remove the noise from non-object regions. However, most of these methods ignore the viewpoint and spatial relationships of objects in the scene, which are crucial for accurate pose estimation of camera. To address this issue, this paper proposes a novel multi-view and multi-object based learning strategy for monocular 6D pose estimation, which involves the consistency of object coordinate for the same object at different viewpoints and the consistency of world coordinate for different objects in the same space. In the proposed method, the spatial and temporal groups are generated to trained the monocular 6D pose estimation network. Due to the camera motion, scene images taken at different times can be regarded as images captured from different viewpoints. Therefore, a temporal consistency loss is designed to constraint the relationship of the same object at different viewpoints, while a spatial consistency loss is designed to constraint the relationship of different objects at the same space. Finally, the proposed method is verified on the public datasets. Experimental results show that the proposed method is accurate, robust, and outperforms similar state-of-the-art approaches.
更多
查看译文
关键词
Monocular 6D pose estimation,Spatial consistency,Temporal consistency,Spatial and temporal groups
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要