Design Alternatives for Large-Scale Web Search: Alexander was Great, Aeneas a Pioneer, and Anakin has the Force

msra(2007)

引用 28|浏览39
暂无评分
摘要
Indexing the Web and meeting the throughput, response- time, and failure-resilience requirements of a search engine requires massive storage and computational resources and a careful system design for scalability. This is exemplified by the big data centers of the leading commercial search en- gines. Various proposals and debates have appeared in the literature as to whether Web indexes can be implemented in a fully distributed or even peer-to-peer manner without im- peding scalability, and dierent partitioning strategies have been worked out. In this paper, we resume this ongoing dis- cussion by analyzing the design space for distributed Web in- dexing, considering the influence of partitioning strategies as well as dierent storage technologies including Flash-RAM. We outline and discuss the pros and cons of three funda- mental alternatives, and characterize their total costs for meeting all performance and availability requirements. We give arguments in favor of a system design based on term partitioning over a DHT-based peer-to-peer network with modern top-k query processing and a judiciously designed combination of disk and Flash-RAM storage, and we show that this design has intriguing properties and a very attrac- tive cost/performance ratio.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要