Program-Based Strategy Induction for Reinforcement Learning
CoRR(2024)
摘要
Typical models of learning assume incremental estimation of
continuously-varying decision variables like expected rewards. However, this
class of models fails to capture more idiosyncratic, discrete heuristics and
strategies that people and animals appear to exhibit. Despite recent advances
in strategy discovery using tools like recurrent networks that generalize the
classic models, the resulting strategies are often onerous to interpret, making
connections to cognition difficult to establish. We use Bayesian program
induction to discover strategies implemented by programs, letting the
simplicity of strategies trade off against their effectiveness. Focusing on
bandit tasks, we find strategies that are difficult or unexpected with
classical incremental learning, like asymmetric learning from rewarded and
unrewarded trials, adaptive horizon-dependent random exploration, and discrete
state switching.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要