EgoPoseFormer: A Simple Baseline for Egocentric 3D Human Pose Estimation
arxiv(2024)
摘要
We present EgoPoseFormer, a simple yet effective transformer-based model for
stereo egocentric human pose estimation. The main challenge in egocentric pose
estimation is overcoming joint invisibility, which is caused by self-occlusion
or a limited field of view (FOV) of head-mounted cameras. Our approach
overcomes this challenge by incorporating a two-stage pose estimation paradigm:
in the first stage, our model leverages the global information to estimate each
joint's coarse location, then in the second stage, it employs a DETR style
transformer to refine the coarse locations by exploiting fine-grained stereo
visual features. In addition, we present a deformable stereo operation to
enable our transformer to effectively process multi-view features, which
enables it to accurately localize each joint in the 3D world. We evaluate our
method on the stereo UnrealEgo dataset and show it significantly outperforms
previous approaches while being computationally efficient: it improves MPJPE by
27.4mm (45
compared to the state-of-the-art. Surprisingly, with proper training
techniques, we find that even our first-stage pose proposal network can achieve
superior performance compared to previous arts. We also show that our method
can be seamlessly extended to monocular settings, which achieves
state-of-the-art performance on the SceneEgo dataset, improving MPJPE by 25.5mm
(21
parameters and 36.4
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要