A Random Walk Model For Optimization Of Search Impact In Web Frontier Ranking

SIGIR '15: The 38th International ACM SIGIR conference on research and development in Information Retrieval Santiago Chile August, 2015(2015)

引用 2|浏览50
暂无评分
摘要
Large-scale web search engines need to crawl the Web continuously to discover and download newly created web content. The speed at which the new content is discovered and the quality of the discovered content can have a big impact on the coverage and quality of the results provided by the search engine. In this paper, we propose a search-centric solution to the problem of prioritizing the pages in the frontier of a crawler for download. Our approach essentially orders the web pages in the frontier through a random walk model that takes into account the pages' potential impact on user-perceived search quality. In addition, we propose a link graph enrichment technique that extends this solution. Finally, we explore a machine learning approach that combines different frontier prioritization approaches. We conduct experiments using two very large, real-life web datasets to observe various search quality metrics. Comparisons with several baseline techniques indicate that the proposed approaches have the potential to improve the user-perceived quality of web search results considerably.
更多
查看译文
关键词
Web search engine,web crawling,URL prioritization,web frontier,discovery,frontier ranking,result relevance,random walks
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要