Dataset Announcement: MS-BioGraphs, Trillion-Scale Public Real-World Sequence Similarity Graphs

2023 IEEE International Symposium on Workload Characterization (IISWC)(2023)

引用 1|浏览4
暂无评分
摘要
Progress in High-Performance Computing in general, and High-Performance Graph Processing in particular, is highly dependent on the availability of publicly-accessible, relevant, and realistic data sets. In this paper, we announce publication of MS-BioGraphs, a new family of publicly-available real-world edge-weighted graph datasets with up to 2.5 trillion edges, that is, 6.6 times greater than the largest graph published recently. We briefly review the two main challenges we faced in generating large graph datasets and our solutions, that are, (i) optimizing data structures and algorithms for this multi-step process and (ii) WebGraph parallel compression technique. We also study some characteristics of MS-BioGraphs. The datasets and the complete version of this paper are available on https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs.
更多
查看译文
关键词
Graph Datasets,High-Performance Graph Processing,High-Performance Computing,Biological Networks,Sequence Similarity Graph
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要