Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in IBMDPs
CoRR(2023)
摘要
Interpretability of AI models allows for user safety checks to build trust in
such AIs. In particular, Decision Trees (DTs) provide a global look at the
learned model and transparently reveal which features of the input are critical
for making a decision. However, interpretability is hindered if the DT is too
large. To learn compact trees, a recent Reinforcement Learning (RL) framework
has been proposed to explore the space of DTs using deep RL. This framework
augments a decision problem (e.g. a supervised classification task) with
additional actions that gather information about the features of an otherwise
hidden input. By appropriately penalizing these actions, the agent learns to
optimally trade-off size and performance of DTs. In practice, a reactive policy
for a partially observable Markov decision process (MDP) needs to be learned,
which is still an open problem. We show in this paper that deep RL can fail
even on simple toy tasks of this class. However, when the underlying decision
problem is a supervised classification task, we show that finding the optimal
tree can be cast as a fully observable Markov decision problem and be solved
efficiently, giving rise to a new family of algorithms for learning DTs that go
beyond the classical greedy maximization ones.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要