Sparse Oblique Decision Trees: A Tool to Interpret Natural Language Processing Datasets

IEEE International Joint Conference on Neural Network (IJCNN)(2022)

引用 0|浏览8
暂无评分
摘要
Natural language processing datasets, for example for document classification or sentiment analysis, are characterized by sparse, high-dimensional feature vectors, often based on bag-of-words approaches. Such datasets contain a wealth of information not just about the predictive task in question, but also about the language itself, and it is of interest to do data mining on such data. While one way to do this is to use standard exploratory data analysis techniques such as clustering or dimensionality reduction, here we propose a different approach, which can be used if we have access to a labeled dataset. The idea is to use sparse oblique decision trees, a type of interpretable model having the structure of a decision tree but where the decision nodes use hyperplanes involving few input features. Such trees can be trained using the Tree Alternating Optimization (TAO) algorithm. Our approach is to train a sparse oblique tree that is as small and sparse as possible while achieving a good enough predictive accuracy, and then to inspect the weights in the tree decision nodes in order to establish a relationship between input features and classes. This reveals interesting patterns about the classifier and about the data itself. For example, we determine how small, specific subsets of features are used for specific classes (say, certain words for certain document topics), both globally or for a single input instance. The hierarchical structure of the tree also explains the common theme among a group of instances. We demonstrate this using the AG news dataset.
更多
查看译文
关键词
interpretability,text classification,text mining,decision trees
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要