Automated textual descriptions for a wide range of video events with 48 human actions

COMPUTER VISION - ECCV 2012: WORKSHOPS AND DEMONSTRATIONS, PT I(2012)

引用 53|浏览1
暂无评分
摘要
Presented is a hybrid method to generate textual descriptions of video based on actions. The method includes an action classifier and a description generator. The aim for the action classifier is to detect and classify the actions in the video, such that they can be used as verbs for the description generator. The aim of the description generator is (1) to find the actors (objects or persons) in the video and connect these correctly to the verbs, such that these represent the subject, and direct and indirect objects, and (2) to generate a sentence based on the verb, subject, and direct and indirect objects. The novelty of our method is that we exploit the discriminative power of a bag-of-features action detector with the generative power of a rule-based action descriptor. Shown is that this approach outperforms a homogeneous setup with the rule-based action detector and action descriptor.
更多
查看译文
关键词
description generator,hybrid method,human action,discriminative power,textual description,rule-based action detector,automated textual description,indirect object,bag-of-features action detector,wide range,action descriptor,video event,rule-based action descriptor,action classifier,informatics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要