Image Understanding using vision and reasoning through Scene Description Graph.

Somak Aditya,Yezhou Yang,Chitta Baral,Yiannis Aloimonos,Cornelia Fermüller

Computer Vision and Image Understanding（2018）

引用 68|浏览67

暂无评分

摘要

•We propose an intermediate structure that captures the semantics of an image.•We propose an Image Understanding architecture that combines vision and reasoning modules to generate such structures, and an implementation of the architecture by combining a deep learning based Visual module with probabilistic reasoning on a Commonsense Knowledge Base.•We enhance the Flickr8k dataset with the observable scene constituents (actions and properties involving objects).•We publish the comparative human evaluations dataset for our approach, two popular neural approaches (Karpathy, Li, Vinyals, Toshev, Bengio, Erhan, 2017) and goldtruth captions for three existing Captioning Datasets (Flickr8k, Flickr30k and MS-COCO), which can be used to propose better automatic caption evaluation metrics (this dataset is used in Anderson et al., 2016 to propose SPICE).

查看译文

关键词

Image Understanding,Commonsense Reasoning,Vision,Reasoning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要