Optimizing Communication For A 2d-Partitioned Scalable Bfs

2016 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC)(2016)

引用 24|浏览10
暂无评分
摘要
Recent research projects have investigated partitioning, acceleration, and data reduction techniques for improving the performance of Breadth First Search (BFS) and the related HPC benchmark, Graph500. However, few implementations have focused on cloud-based systems like Amazon's Web Services, which differ from HPC systems in several ways, most importantly in terms of network interconnect.This work looks at optimizations to reduce the communication overhead of an accelerated, distributed BFS on an HPC system and a smaller cloud-like system that contains GPUs. We demonstrate the effects of an efficient 2D partitioning scheme and allreduce implementation, as well as different CPU-based compression schemes for reducing the overall amount of data shared between nodes. Timing and Score-P profiling results demonstrate a dramatic reduction in row and column frontier queue data (up to 91%) and show how compression can improve performance for a bandwidth-limited cluster.
更多
查看译文
关键词
2D-partitioned scalable BFS,partitioning technique,acceleration technique,data reduction technique,performance improvement,breadth first search,HPC benchmark,Graph500,cloud-based systems,Amazon's Web services,network interconnect,communication overhead reduction,accelerated BFS,distributed BFS,cloud-like system,GPUs,2D partitioning scheme,allreduce implementation,CPU-based compression schemes,timing,score-p profiling,bandwidth-limited cluster,breadth first-search,optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要