Spatio-Temporal Attention and Gaussian Processes for Personalized Video Gaze Estimation
arxiv(2024)
摘要
Gaze is an essential prompt for analyzing human behavior and attention.
Recently, there has been an increasing interest in determining gaze direction
from facial videos. However, video gaze estimation faces significant
challenges, such as understanding the dynamic evolution of gaze in video
sequences, dealing with static backgrounds, and adapting to variations in
illumination. To address these challenges, we propose a simple and novel deep
learning model designed to estimate gaze from videos, incorporating a
specialized attention module. Our method employs a spatial attention mechanism
that tracks spatial dynamics within videos. This technique enables accurate
gaze direction prediction through a temporal sequence model, adeptly
transforming spatial observations into temporal insights, thereby significantly
improving gaze estimation accuracy. Additionally, our approach integrates
Gaussian processes to include individual-specific traits, facilitating the
personalization of our model with just a few labeled samples. Experimental
results confirm the efficacy of the proposed approach, demonstrating its
success in both within-dataset and cross-dataset settings. Specifically, our
proposed approach achieves state-of-the-art performance on the Gaze360 dataset,
improving by 2.5^∘ without personalization. Further, by personalizing the
model with just three samples, we achieved an additional improvement of
0.8^∘. The code and pre-trained models are available at
.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要