Fast Converging Multi-armed Bandit Optimization Using Probabilistic Graphical Model.

Chen Zhao,Kohei Watanabe,Bin Yang,Yu Hirate

ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II（2018）

引用 4|浏览153

暂无评分

摘要

This paper designs a strategic model used to optimize click-though rates (CTR) for profitable recommendation systems. Approximating a function from samples as a vital step of data prediction is desirable when ground truth is not directly accessible. While interpolation algorithms such as regression and non-kernel SVMs are prevalent in modern machine learning, they are, however, in many cases not proper options for fitting arbitrary functions with no closed-form expression. The major contribution of this paper consists of a semi-parametric graphical model complying with properties of the Gaussian Markov random field (GMRF) to approximate general functions that can be multivariate. Based upon model inference, this paper further investigates several policies commonly used in Bayesian optimization to solve the multi-armed bandit model (MAB) problem. The primary objective is to locate global optimum of an unknown function. In case of recommendation, the proposed algorithm leads to maximum user clicks from rescheduled recommendation policy while maintaining the lowest possible cost. Comparative experiments are conducted among a set of policies. Empirical evaluation suggests that Thompson sampling is the most suitable policy for the proposed algorithm.

查看译文

关键词

Multi-armed bandit,Markov random field,Bayesian network,Recommendation system,Machine learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要