Causal Bandits for Linear Structural Equation Models

JOURNAL OF MACHINE LEARNING RESEARCH(2023)

引用 0|浏览27
暂无评分
摘要
This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model to minimize cumulative regret with respect to the best intervention in hindsight. This is, naturally, posed as a causal bandit problem. The focus is on causal bandits for linear structural equation models (SEMs) and soft interventions. It is assumed that the graph's structure is known and has N nodes. Two linear mechanisms, one soft intervention and one observational, are assumed for each node, giving rise to 2N possible interventions. The majority of the existing causal bandit algorithms assume that at least the interventional distributions of the reward node's parents are fully specified. However, there are 2N such distributions (one corresponding to each intervention), acquiring which becomes prohibitive even in moderate-sized graphs. This paper dispenses with the assumption of knowing these distributions or their marginals. Two algorithms are proposed for the frequentist (UCB-based) and Bayesian (Thompson sampling-based) settings. The key idea of these algorithms is to avoid directly estimating the 2N reward distributions and instead estimate the parameters that fully specify the SEMs (linear in N) and use them to compute the rewards. In both algorithms, under boundedness assumptions on noise and the parameter space, the cumulative regrets scal
更多
查看译文
关键词
Causal bandits,multi-armed bandits,causality,linear SEMs,cumulative regret.
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要