Dimensionality reduction for efficient hand tracking, pose estimation and classification

semanticscholar(2013)

引用 0|浏览0
暂无评分
摘要
Pose recovery and tracking of articulated objects based on visual observations is a theoretically interesting and challenging problem. One of its instances, human hand pose estimation, has a number of diverse applications including but not limited to human-computer interaction, robot learning by demonstration, human hand motion analysis etc. The problem is associated with many challenges that are introduced because of its high dimensionality, the severe hand self-occlusions, the potentially fast hand motions and the uncontrolled environments in which hands are observed. Some of these difficulties can be alleviated by employing specialized motion capture hardware or visual markers. However, such methods interfere with the observed scene and/or require a costly hardware setup. This work is built upon an existing model-based method that tracks and recovers the 3D position, orientation and 20 DOF articulation of a human hand from markerless visual observations obtained by an RGB-D sensor. According to this baseline method, hand pose estimation is formulated as an optimization problem, seeking for the hand model parameters that minimize the discrepancy between the appearance of hypothesized hand configurations and the actual hand observation. The optimization problem is handled by a variant of Particle Swarm Optimization (PSO), which searches the parametric space of hand configurations. The high dimensionality of this space affects the computational performance of this method. More specifically, a computational performance of 20 fps is achieved, but only thanks to an elaborate GPU implementation on a high end computer. A closer study of the problem reveals that the parametric space of hand configurations is highly redundant. For example, it can represent implausible hand poses. Additionally, when a human hand is known to be engaged in specific activities (e.g., grasping, sign language, etc), its configurations are known to lie in a much lower dimensional manifold. In this work, we employ Principal Component Analysis (PCA) to create a space of reduced dimensionality that describes effectively the human hand articulation, by implicitly modeling relevant constraints. By doing so one needs to solve a much simpler optimization problem, requiring less computational effort to find the optimal hand configuration. Multiple variants of the proposed methodology are formed for the problems of hand pose recovery, pose tracking and pose classification. Extensive experimental results study the relationships among the proposed method’s accuracy, the dimensionality of the search space and the computational budget required to solve the pose recovery, tracking and classification problems. The obtained results demonstrate that the proposed approach achieves better accuracy in pose recovery compared to the baseline method using only 1/4 of the latter’s computational budget. Moreover, the method classified hand postures into the 10 classes of the Chinese numbers signs with an accuracy of 87% to 100%, depending on the employed computational budget.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要