Scalable Collectives for Distributed Asynchronous Many-Task Runtimes

Matthew Whitlock,Hemanth Kolla,Sean Treichler,Philippe P. Pébay,Janine C. Bennett

2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)（2018）

引用 2|浏览61

暂无评分

摘要

Global collectives (reductions/aggregations) are ubiquitous and feature in nearly every application of distributed high-performance computing (HPC). While it is advisable to devise algorithms by placing collectives off the critical path of execution, they are sometimes unavoidable for correctness, numerical convergence and analyses purposes. Scalable algorithms for distributed collectives are well studied and have become an integral part of MPI, but new and emerging distributed computing frameworks and paradigms such as Asynchronous Many-Task (AMT) models lack the same sophistication for distributed collectives. Since the central promise of AMT runtimes is that they automatically discover, and expose, task dependencies in the underlying program and can schedule work optimally to minimize idle time and hide data movement, a naively designed collectives protocol can completely offset any gains made from asynchronous execution. In this study we demonstrate that scalable distributed collectives are indispensable for performance in AMT models. We design, implement and test the performance of a scalable collective algorithm in Legion, an exemplar data-centric AMT programming model. Our results show that AMT systems contain the necessary primitives that allow for fully scalable collectives without breaking the transparent data movement abstractions. Scalability tests of an integrated Legion 1D stencil mini-application show the clear benefit of implementing scalable collectives and the performance degradation when a naïve collectives alternative is used instead.

查看译文

关键词

Programming Models,Asynchronous Many-Task,Parallel Computing,Distributed Collectives,Computational Statistics

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要