Video Representations of Goals Emerge from Watching Failure

arxiv（2020）

引用 0|浏览57

暂无评分

摘要

We introduce a video representation learning framework that models the latent goals behind observable human action. Motivated by how children learn to reason about goals and intentions by experiencing failure, we leverage unconstrained video of unintentional action to learn without direct supervision. Our approach models videos as contextual trajectories that represent both low-level motion and high-level action features. Experiments and visualizations show the model is able to predict underlying goals, detect when action switches from intentional to unintentional, and automatically correct unintentional action. Although the model is trained with minimal supervision, it is competitive with highly-supervised baselines, underscoring the role of failure examples for learning goal-oriented video representations. The project website is available at https://aha.cs.columbia.edu/

查看译文

关键词

contextual trajectories,low-level motion,high-level action,trained model,underlying goals,automatically correct unintentional action,leveraging gradient signals,minimal supervision,successfully executed goals,observing unintentional action,observable human action,developmental psychology,leverage video,video representations,direct supervision

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要