Tree ensembles for learning to rank

Tree ensembles for learning to rank(2011)

引用 28|浏览8
暂无评分
摘要
Given the role that the Web plays in modern society, Web information retrieval has become more important than ever. The Web is an extremely large collection of documents created in a decentralized and uncontrolled manner. Given user queries, it is common that millions of documents match. Finding the few documents that are more relevant for the user is, therefore, critical, but it is also the challenge of modern search engines. The process of ordering the matches by their relevance to the user is called ranking. Besides its importance in search engines, ranking is equally important in many other retrieval tasks, such as question answering, collaborative filtering, document summarization, and machine translation. My thesis focuses on the ranking problem, and, in particular, it focuses on machine learning approaches that do so in an efficient and effective manner. Many machine learning methods have been used to train ranking models, and a new research area named learning to rank has gradually emerged. Approaches based on tree ensembles arguably define the current state-of-the-art in learning to rank. In this dissertation, I study the effectiveness of tree ensembles for learning to rank and propose different approaches for improving their accuracy, reducing their variance, and increasing their runtime efficiency. I present a framework for distributed tuning of ranking algorithms. Then I discuss how bagging the ranking ensembles results in more accurate ranking models that also have lower variance. In some applications, the large ranking ensembles that are obtained after bagging the boosted models may be too expensive in terms of runtime requirements. I present the results of different approaches for compressing a tree ensemble after it is constructed. I use a case study to discuss some of the limitations of the axis-parallel trees that are typically used as the building blocks of the tree based ranking ensembles. I discuss the importance of meta features in improving the quality of the ranking models and reducing their complexity. Finally, I discuss the optimization techniques I used in jforests, the open source Java library that I built for experimenting with tree based machine learning algorithms.
更多
查看译文
关键词
axis-parallel tree,accurate ranking model,large ranking ensemble,ranking ensemble,ranking problem,ranking ensembles result,different approach,ranking algorithm,tree ensemble,ranking model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要