A Sequential Learning-based Approach for Monocular Human Performance Capture.

Jianchun Chen, Jayakorn Vongkulbhisal,Fernando De la Torre Frade

IEEE/CVF Winter Conference on Applications of Computer Vision(2024)

引用 0|浏览3
暂无评分
摘要
Human performance capture from RGB videos in unconstrained environments has become very popular for applications that require generating virtual avatars or digital actors. SOTA methods use neural network (NN) techniques to estimate the shape directly from photos, yielding a simplified model of the human body. While effective, NN techniques frequently fail under challenging poses and do not preserve temporal consistency. On the other hand, optimization-based methods like shape-from-silhouette can produce more precise reconstruction; however, they typically require a good initialization and are computationally more intensive than NN. To address issues of previous methods, this work proposes a learning-based approach for optimizing fine-grained shape representation from a monocular RGB video. Our main idea is to sequentially recover different shape details (i.e. average shape, clothing, wrinkles) using separate neural networks. At each level, our network takes the sparse/noisy gradients of body mesh vertices w.r.t. the shape, and predicts dense gradients to update the body shape. Despite being trained on synthetic data, these networks have surprisingly good generalization to real images. Experimental validation shows that our approach outperforms NN approaches in recovering shape details while also being an order of magnitude faster than optimization-based methods and robust across varied poses and novel views.
更多
查看译文
关键词
Algorithms,3D computer vision,Algorithms,Biometrics,face,gesture,body pose
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要