Gaze-guided Hand-Object Interaction Synthesis: Benchmark and Method
CoRR(2024)
摘要
Gaze plays a crucial role in revealing human attention and intention,
shedding light on the cognitive processes behind human actions. The integration
of gaze guidance with the dynamics of hand-object interactions boosts the
accuracy of human motion prediction. However, the lack of datasets that capture
the intricate relationship and consistency among gaze, hand, and object
movements remains a substantial hurdle. In this paper, we introduce the first
Gaze-guided Hand-Object Interaction dataset, GazeHOI, and present a novel task
for synthesizing gaze-guided hand-object interactions. Our dataset, GazeHOI,
features simultaneous 3D modeling of gaze, hand, and object interactions,
comprising 479 sequences with an average duration of 19.1 seconds, 812
sub-sequences, and 33 objects of various sizes. We propose a hierarchical
framework centered on a gaze-guided hand-object interaction diffusion model,
named GHO-Diffusion. In the pre-diffusion phase, we separate gaze conditions
into spatial-temporal features and goal pose conditions at different levels of
information granularity. During the diffusion phase, two gaze-conditioned
diffusion models are stacked to simplify the complex synthesis of hand-object
motions. Here, the object motion diffusion model generates sequences of object
motions based on gaze conditions, while the hand motion diffusion model
produces hand motions based on the generated object motion. To improve
fine-grained goal pose alignment, we introduce a Spherical Gaussian constraint
to guide the denoising step. In the subsequent post-diffusion phase, we
optimize the generated hand motions using contact consistency. Our extensive
experiments highlight the uniqueness of our dataset and the effectiveness of
our approach.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要