Space Performance Tradeoffs in Compressing MPI Group Data Structures.

EuroMPI(2016)

引用 1|浏览14
暂无评分
摘要
MPI is a popular programming paradigm on parallel machines today. MPI libraries sometimes use O(N) data structures to implement MPI functionality. The IBM Blue Gene/Q machine has 16 GB memory per node. If each node runs 32 MPI processes, only 512 MB is available per process, requiring the MPI library to be space efficient. This scenario will become severe in a future Exascale machine with tens of millions of cores and MPI endpoints. We explore techniques to compress the dense O(N) mapping data structures that map the logical process ID to the global rank. Our techniques minimize topological communicator mapping state by replacing table lookups with a mapping function. We also explore caching schemes with performance results to optimize overheads of the mapping functions for recent translations in multiple MPI micro-benchmarks, and the 3D FFT and Algebraic Multi Grid application benchmarks.
更多
查看译文
关键词
MPI, MPI collective communication, MPI 3.0, MPI communicators and MPI_Comm_split
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要