Efficient Multi-GPU Graph Processing with Remote Work Stealing.

ICDE(2023)

引用 1|浏览15
暂无评分
摘要
Graph algorithms support a broad spectrum of big data applications. A typical approach to scale graph algorithms is to run in a distributed and parallel setting with multiple processing devices. The approach requires balanced and effective utilization of computation, memory, and communication resources across devices. To address the problem, a large number of studies have been conducted, such as graph partitioning and asynchronous computation. However, there are still many outstanding issues yet to be solved. For example, the workloads can be skewed differently across devices, and between iterations, even with the state-of-the-art graph partitioners. As the graph partitions are typically static, they fall short in capturing the dynamic characteristics with different algorithms, inputs, and progress, leading to poor utilization of resources. Recently, GPUs have been increasingly used to accelerate various graph algorithms. Their highly efficient interconnection technologies, such as NVLink, open new opportunities for us to achieve better resource utilization. In this paper, we analyze the dynamic load-imbalance (DLB) problem and the long tail (LT) problem in multi-GPUs and solve them by adaptive remote work stealing on-the-fly. We first introduce a frontier stealing algorithm to solve the DLB problem, then an ownership stealing algorithm to solve the LT problem. Based on these two algorithms, we developed Gum — a multi-GPU graph processing system with high device utilization. We evaluated Gum on four typical graph algorithms (BFS, WCC, PR, SSSP). The results show that Gum can run up to an order of magnitude faster than Gunrock and Groute, with fewer stragglers and less synchronization overhead.
更多
查看译文
关键词
Graph processing,Multi-GPU,Load balance
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要