谷歌浏览器插件
订阅小程序
在清言上使用

Discovering obscure looking glass sites on the web to facilitate internet measurement research

International Conference On Emerging Networking Experiments And Technologies(2021)

引用 3|浏览7
暂无评分
摘要
ABSTRACTDespite researchers have noticed that Looking Glass (LG) vantage points (VPs) are valuable for Internet measurement researches, they can only exploit VPs from well-known LG sites published on several LG portal pages. There should be a lot of LG sites that are not published in these portal pages, namely obscure LG sites, which are not easy to be found and exploited by researchers. In this paper, we design an efficient focused crawler to discover as many LG sites as possible which can avoid unnecessary resource consumption on analyzing irrelevant pages. Our designed focused crawler takes a similarity-guided search that exploits the well-developed search engines and comprehensively mines the common features shared by known LG sites to discover more LG pages. Moreover, the focused crawler takes a two-step PU learning classifier based on carefully selected LG features to efficiently discard irrelevant URLs, thus avoiding a lot of unnecessary resource consumption. As far as we know, we are the first to develop a method to discover obscure LG sites on the web. Experimental results show the effectiveness of our focused crawler. To facilitate practical applications, we further develop an automation tool, which can successfully retrieve 910 obscure automatable LG VPs from relevant pages obtained through our focused crawler. The 910 LG VPs significantly increase the geographic and network coverage of available VPs and we show their potential values in improving the completeness of AS-level Internet topology by a simple case study. Our method and the final VP list are beneficial to the measurement community.
更多
查看译文
关键词
Web Crawling,URL Filtering,Hidden Web,Web Data Extraction,Page Segmentation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要