DirichletRank: Ranking Web Pages Against Link Spams

msra(2005)

引用 23|浏览23
暂无评分
摘要
Anti-spamming has become one of the most important challenges to web search en- gines and attracted increasing attention in both industry and academia recently. Since most search engines now use link-based ranking algorithms, link-based spamming has become a major threaten. In this paper, we show that the popular link-based ranking algorithm PageRank, while being successfully used in the Google search engine, has a \zero-one gap" a w, which can be potentially exploited to spam PageRank results eas- ily. The \zero-one gap" problem arises from the current ad hoc way of computing the transition probabilities in the random surng model. We propose a novel DirichletRank algorithm in a more principled way of computing these probabilities based on Bayesian estimation with a Dirichlet prior. DirichletRank is a variant of PageRank, but it does not have the problem of \zero-one gap" and is analytically shown to be substantially more resistant to link farm spams than PageRank. Simulation experiments using real web data show that, compared with the original PageRank, DirichletRank is signi- cantly more robust against several typical link spams and is more stable under link perturbations, in general. Moreover, experiment results also show that DirichletRank
更多
查看译文
关键词
computer science
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要