High Performance MPI on IBM 12x InfiniBand Architecture

Abhinav Vishnu,Brad Benton,Dhabaleswar K. Panda

IEEE International Parallel and Distributed Processing Symposium（2007）

引用 15|浏览5

暂无评分

摘要

InfiniBand is becoming increasingly popular in the area of cluster computing due to its open standard and high performance. I/O interfaces like PCI-express and GX+ are being introduced as next generation technologies to drive InfiniBand with very high throughput. HCAs with throughput of 8x on PCI-express have become available. Recently, support for HCAs with 12x throughput on GX+ has been announced. In this paper, we design a message passing interface (MPI) on IBM 12x dual-port HCAs, which consist of multiple send/recv engines per port. We propose and study the impact of various communication scheduling policies (binding, striping and round robin). Based on this study, we present a new policy, EPC (enhanced point-to-point and collective), which incorporates different kinds of communication patterns; point-to-point (blocking, non-blocking) and collective communication, for data transfer. We implement our design and evaluate it with micro-benchmarks, collective communication and NAS parallel benchmarks. Using EPC on a 12x InfiniBand cluster with one HCA and one port, we can improve the performance by 41% with pingpong latency test and 63-65% with the unidirectional and bi-directional bandwidth tests, compared with the default single-rail MPI implementation. Our evaluation on NAS parallel benchmarks shows an improvement of 7-13% in execution time for integer sort and Fourier transform.

查看译文

关键词

Fourier transforms,application program interfaces,computer architecture,message passing,peripheral interfaces,Fourier transform,HCA,IBM 12x InfiniBand architecture,PCI-express,application program interface,cluster computing,communication scheduling policy,data transfer,high performance MPI,message passing,peripheral interface

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要