Architectural support for efficient message passing on shared memory multi-cores.

J. Parallel Distrib. Comput.(2016)

引用 1|浏览20
暂无评分
摘要
Thanks to programming approaches like actor-based models, message passing is regaining popularity outside large-scale scientific computing for building scalable distributed applications in multi-core processors. Unfortunately, the mismatch between message passing models and today's shared-memory hardware provided by commercial vendors results in suboptimal performance and a waste of energy. This paper presents a set of architectural extensions to reduce the overheads incurred by message passing workloads running on shared memory multi-core architectures. It describes the instruction set extensions and the hardware implementation. In order to facilitate programmability, the proposed extensions are used by a message passing library, allowing programs to take advantage of them transparently. As a proof-of-concept, we use modified MPI libraries and unmodified MPI programs to evaluate the proposal. Experimental results show that a best-effort design can eliminate over 60% of cache accesses caused by message data transmission and reduce the cycles spent in such task by 75%, while the addition of a simple coprocessor can completely off-load data movement from the CPU to avoid up to 92% of cache accesses, and a reduction of 12% of network traffic on average. The design achieves an improvement of 11%-12% in the energy-delay product of on-chip caches. We present hardware support to reduce overheads incurred by message passing (MP).We modified an MPI library to add support for our ISA extensions.Our design eliminates over 60%-92% of cache accesses during data transfer.Adding simple MP support to shared memory multicores improves energy efficiency.
更多
查看译文
关键词
Message passing,Shared memory,Multicore
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要