Cluster-preserving sampling algorithm for large-scale graphs

SCIENCE CHINA-INFORMATION SCIENCES(2023)

引用 0|浏览9
暂无评分
摘要
Graph sampling is a very effective method to deal with scalability issues when analyzing large-scale graphs. Lots of sampling algorithms have been proposed, and sampling qualities have been quantified using explicit properties (e.g., degree distribution) of the sample. However, the existing sampling techniques are inadequate for the current sampling task: sampling the clustering structure, which is a crucial property of the current networks. In this paper, using different expansion strategies, two novel top-leader sampling methods (i.e., TLS-e and TLS-i) are proposed to obtain representative samples, and they are capable of effectively preserving the clustering structure. The rationale behind them is to select top-leader nodes of most clusters into the sample and then heuristically incorporate peripheral nodes into the sample using specific expansion strategies. Extensive experiments are conducted to investigate how well sampling techniques preserve the clustering structure of graphs. Our empirical results show that the proposed sampling algorithms can preserve the population’s clustering structure well and provide feasible solutions to sample the clustering structure from large-scale graphs.
更多
查看译文
关键词
graph sampling,clustering structure,top-leader nodes,expansion strategies,large-scale graphs
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要