Robust LiDAR-Camera Alignment With Modality Adapted Local-to-Global Representation

Angfan Zhu,Yang Xiao,Chengxin Liu,Zhiguo Cao

IEEE Transactions on Circuits and Systems for Video Technology（2023）

引用 0|浏览56

暂无评分

摘要

LiDAR-Camera alignment (LCA) is an important preprocessing procedure for fusing LiDAR and camera data. For it, one key issue is to extract unified cross-modality representation for characterizing the heterogeneous LiDAR and camera data effectively and robustly. The main challenge is to resist the modality gap and visual data degradation during feature learning, while still maintaining strong representative power. To address this, a novel modality adapted local-to-global representation learning method is proposed. The research efforts are paid in 2 main folders via modality adaptation and capturing global spatial context. First for modality gap resistance, LiDAR and camera data is projected into the same depth map domain for unified representation learning. Particularly, LiDAR data is converted to depth map according to pre-acquired extrinsic parameters. Thanks to the recent advantage of deep learning based monocular depth estimation, camera data is transformed into depth map in data driven manner, which is jointly optimized with LCA. Secondly to capture global spatial context, ViT (vision transformer) is introduced to LCA. The concept of LCA token is proposed for aggregating the local spatial patterns to form global spatial representation with transformer encoding. And, it is shared by all the samples. In this way, it can involve global sample-level information to leverage generalization ability. The experiments on KITTI dataset verify superiority of our proposition. Furthermore, the proposed approach is more robust to camera data degeneration (e.g., imaging blurring and noise) often faced by the practical applications. Under some challenging test cases, the performance advancement of our method is over

$1.9~cm$

/4.1° on translation / rotation error. While our model size (8.77M) is much smaller than existing methods (e.g., LCCNet of 66.75M). The source code will be released at https://github.com/Zaf233/RLCA upon acceptance.

查看译文

关键词

LiDAR-Camera alignment,modality adaptation,global context,vision transformer

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要