Multi-timescale reinforcement learning in the brain

Paul Masset,Pablo Tano,HyungGoo R. Kim,Athar N. Malik,Alexandre Pouget,Naoshige Uchida

biorxiv（2023）

引用 0|浏览3

暂无评分

摘要

To thrive in complex environments, animals and artificial agents must learn to act adaptively to maximize fitness and rewards. Such adaptive behavior can be learned through reinforcement learning[1][1], a class of algorithms that has been successful at training artificial agents[2][2]–[6][3] and at characterizing the firing of dopamine neurons in the midbrain[7][4]–[9][5]. In classical reinforcement learning, agents discount future rewards exponentially according to a single time scale, controlled by the discount factor. Here, we explore the presence of multiple timescales in biological reinforcement learning. We first show that reinforcement agents learning at a multitude of timescales possess distinct computational benefits. Next, we report that dopamine neurons in mice performing two behavioral tasks encode reward prediction error with a diversity of discount time constants. Our model explains the heterogeneity of temporal discounting in both cue-evoked transient responses and slower timescale fluctuations known as dopamine ramps. Crucially, the measured discount factor of individual neurons is correlated across the two tasks suggesting that it is a cell-specific property. Together, our results provide a new paradigm to understand functional heterogeneity in dopamine neurons, a mechanistic basis for the empirical observation that humans and animals use non-exponential discounts in many situations [10][6]–[14][7], and open new avenues for the design of more efficient reinforcement learning algorithms. ### Competing Interest Statement The authors have declared no competing interest. [1]: #ref-1 [2]: #ref-2 [3]: #ref-6 [4]: #ref-7 [5]: #ref-9 [6]: #ref-10 [7]: #ref-14

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要