Neural Constrained Combinatorial Bandits.

INFOCOM(2023)

引用 0|浏览12
暂无评分
摘要
Constrained combinatorial contextual bandits have emerged as trending tools in intelligent systems and networks to model reward and cost signals under combinatorial decision-making. On one hand, both signals are complex functions of the context, e.g., in federated learning, training loss (negative reward) and energy consumption (cost) are nonlinear functions of edge devices’ system conditions (context). On the other hand, there are cumulative constraints on costs, e.g., the accumulated energy consumption should be budgeted by energy resources. Besides, real-time systems often require such constraints to be guaranteed anytime or in each round, e.g., ensuring anytime fairness for task assignment to maintain the credibility of crowdsourcing platforms for workers. This setting imposes a challenge on how to simultaneously achieve reward maximization while subjecting to anytime cumulative constraints. To address such challenge, we propose a primal-dual algorithm (Neural-PD) whose primal component adopts multi-layer perceptrons to estimate reward and cost functions, and its dual component estimates the Lagrange multiplier with the virtual queue. By integrating neural tangent kernel theory and Lyapunov-drift techniques, we prove Neural-PD achieves a sharp regret bound and a zero constraint violation. We also show Neural-PD outperforms existing algorithms with extensive experiments on both synthetic and real-world datasets.
更多
查看译文
关键词
anytime cumulative constraints,anytime fairness,combinatorial decision-making,cost functions,edge devices,energy consumption,federated learning,intelligent systems,Lagrange multiplier,Lyapunov-drift techniques,multilayer perceptrons,neural constrained combinatorial bandits,neural tangent kernel theory,Neural-PD,primal-dual algorithm,reward functions,reward maximization,zero constraint violation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要