A distributional code for value in dopamine-based reinforcement learning

Will Dabney,Zeb Kurth-Nelson,Naoshige Uchida,Clara Kwon Starkweather,Demis Hassabis,Rémi Munos,Matthew Botvinick

NATURE（2020）

引用 403|浏览326

暂无评分

摘要

Since its introduction, the reward prediction error theory of dopamine has explained a wealth of empirical phenomena, providing a unifying framework for understanding the representation of reward and value in the brain 1 – 3 . According to the now canonical theory, reward predictions are represented as a single scalar quantity, which supports learning about the expectation, or mean, of stochastic outcomes. Here we propose an account of dopamine-based reinforcement learning inspired by recent artificial intelligence research on distributional reinforcement learning 4 – 6 . We hypothesized that the brain represents possible future rewards not as a single mean, but instead as a probability distribution, effectively representing multiple future outcomes simultaneously and in parallel. This idea implies a set of empirical predictions, which we tested using single-unit recordings from mouse ventral tegmental area. Our findings provide strong evidence for a neural realization of distributional reinforcement learning.

查看译文

关键词

Cognitive neuroscience,Learning algorithms,Learning and memory,Reward,Science,Humanities and Social Sciences,multidisciplinary

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要