Image Understanding using vision and reasoning through Scene Description Graph.

Computer Vision and Image Understanding(2018)

引用 68|浏览67
暂无评分
摘要
•We propose an intermediate structure that captures the semantics of an image.•We propose an Image Understanding architecture that combines vision and reasoning modules to generate such structures, and an implementation of the architecture by combining a deep learning based Visual module with probabilistic reasoning on a Commonsense Knowledge Base.•We enhance the Flickr8k dataset with the observable scene constituents (actions and properties involving objects).•We publish the comparative human evaluations dataset for our approach, two popular neural approaches (Karpathy, Li, Vinyals, Toshev, Bengio, Erhan, 2017) and goldtruth captions for three existing Captioning Datasets (Flickr8k, Flickr30k and MS-COCO), which can be used to propose better automatic caption evaluation metrics (this dataset is used in Anderson et al., 2016 to propose SPICE).
更多
查看译文
关键词
Image Understanding,Commonsense Reasoning,Vision,Reasoning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要