An algorithm for multi-armed bandit based on variance change sensitivity

Engineering Research Express(2024)

引用 0|浏览0
暂无评分
摘要
Abstract The Multi-Arm Bandit problem is becoming increasingly popular as it enables real-world sequential decision making across application domains, including clinical trials, recommender systems, and online decision making. The Multi-Arm Bandit problem is a classical problem of exploration and exploitation dilemma in reinforcement learning and it needs to decide on the optimal strategy based on the reward situation of each rocker arm. However, the existing Multi-Arm Bandit algorithms have many shortcomings, such as blind exploration, weak generalization ability and endless exploration. Aiming at the shortcomings of the existing Multi-Armed Bandit algorithms, this paper proposes a Multi-Armed Bandit algorithm based on variance change sensitivity. The algorithm takes the variance change of reward as a clue, and adjusts the exploration probability by the average variance change of all actions, and selects the action with the largest variance change when exploring. At the same time, in order to reduce the waste of action selection times and maximize the cumulative reward, A parameter was introduced to record the number of consecutive selections of the same action. The exploration stops when reaches a certain value. Experiments show that the exploration algorithm can obtain higher reward value and lower regret value in the end.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要