Can deep convolutional networks explain the semantic structure that humans see in photographs?

Journal of Vision(2023)

引用 0|浏览1
暂无评分
摘要
Deep convolutional networks (DCN) have been proposed as useful models of the ventral visual processing stream. This study evaluates whether such models can capture the rich semantic similarities that people discern amongst photographs of familiar objects. We first created a new dataset that merges representative images of everyday concepts (taken from the Ecoset) with the large semantic feature set collected by the Leuven group. The resulting set includes ~300,000 images depicting items in 86 different semantic categories including 46 animate items (reptiles, insects and mammals) and 40 inanimate (vehicles, instruments, tools, and kitchen items). Each category is also associated with values on a set of ~2000 semantic features, generated by human raters in a prior study. We then trained two variants of the AlexNet architecture on these items: one that learned to activate just the corresponding category label, and a second that learned to generate all of an item’s semantic features. Finally, we evaluated how accurately the learned representations in each model could predict human decisions in a triplet-judgment task conducted using photographs from the training set. Both models predicted some human triplet judgments better than chance, but the model trained to output semantic feature vectors performed better and captured more levels of semantic similarity. Neither model, however, performed as well as an embedding computed directly from the semantic feature norms themselves. The results suggest that deep convolutional image classifiers alone do a poor job capturing the semantic similarity structure that drives human judgments, but that alterations in the training task–in particular, training on output vectors that express richer semantic structure–can greatly overcome this limitation.
更多
查看译文
关键词
deep convolutional networks,photographs,semantic structure
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要