Architectural support for efficient message passing on shared memory multi-cores.

J. Rubén Titos Gil,Oscar Palomar,Osman S. Unsal,Adrián Cristal

J. Parallel Distrib. Comput.（2016）

引用 1|浏览20

暂无评分

摘要

Thanks to programming approaches like actor-based models, message passing is regaining popularity outside large-scale scientific computing for building scalable distributed applications in multi-core processors. Unfortunately, the mismatch between message passing models and today's shared-memory hardware provided by commercial vendors results in suboptimal performance and a waste of energy. This paper presents a set of architectural extensions to reduce the overheads incurred by message passing workloads running on shared memory multi-core architectures. It describes the instruction set extensions and the hardware implementation. In order to facilitate programmability, the proposed extensions are used by a message passing library, allowing programs to take advantage of them transparently. As a proof-of-concept, we use modified MPI libraries and unmodified MPI programs to evaluate the proposal. Experimental results show that a best-effort design can eliminate over 60% of cache accesses caused by message data transmission and reduce the cycles spent in such task by 75%, while the addition of a simple coprocessor can completely off-load data movement from the CPU to avoid up to 92% of cache accesses, and a reduction of 12% of network traffic on average. The design achieves an improvement of 11%-12% in the energy-delay product of on-chip caches. We present hardware support to reduce overheads incurred by message passing (MP).We modified an MPI library to add support for our ISA extensions.Our design eliminates over 60%-92% of cache accesses during data transfer.Adding simple MP support to shared memory multicores improves energy efficiency.

查看译文

关键词

Message passing,Shared memory,Multicore

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要