Improving Mpi Multi-Threaded Rma Communication Performance

PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING(2018)

引用 25|浏览125
暂无评分
摘要
One-sided communication is crucial to enabling communication concurrency. As core counts have increased, particularly with manycore architectures, one-sided (RMA) communication has been proposed to address the ever increasing contention at the network interface. The difficulty in using one-sided (RMA) communication with MPI is that the performance of MPI implementations using RMA with multiple concurrent threads is not well understood. Past studies have been done using MPI RMA in combination with multithreading (RMA-MT) but they have been performed on older MPI implementations lacking RMA-MT optimizations. In addition prior work has only been done at smaller scale (<=512 cores).In this paper, we describe a new RMA implementation for Open MPI. The implementation targets scalability and multi-threaded performance. We describe the design and implementation of our RMA improvements and offer an evaluation that demonstrates scaling to 524,288 cores, the full size of a leading supercomputer installation. In contrast, the previous implementation failed to scale past approximately 4,096 cores. To evaluate this approach, we then compare against a vendor optimized MPI RMA-MT implementation with microbenchmarks, a mini-application, and a full astrophysics code at large scale on a many-core architecture. This is the first time that an evaluation at large scale on many-core architectures has been done for MPI RMA-MT (524,288 cores) and the first large scale application performance comparison between two different RMA-MT optimized MPI implementations. The results show a 8.6% benefit to our optimized open source MPI for a full application code running on 512K cores.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要