Evaluating HPC networks via simulation of parallel workloads.

Nikhil Jain,Abhinav Bhatele,Sam White,Todd Gamblin,Laxmikant V. Kalé

SC（2016）

引用 92|浏览81

暂无评分

摘要

This paper presents an evaluation and comparison of three topologies that are popular for building interconnection networks in large-scale supercomputers: torus, fat-tree, and dragonfly. To perform this evaluation, we propose a comprehensive methodology and present a scalable packet-level network simulator, TraceR. Our methodology includes design of prototype systems that are being evaluated, use of proxy applications to determine computation and communication load, simulating individual proxy applications and multi-job workloads, and computing aggregated performance metrics. Using the proposed methodology, prototype systems based on torus, fat-tree, and dragonfly networks with up to 730K endpoints (MPI processes) executed on 46K nodes are compared in the context of multi-job workloads from capability and capacity systems. For the 180 Petaflop/s prototype systems simulated in this paper, we show that different topologies are superior in different scenarios, i.e. there is no single best topology, and the characteristics of parallel workloads determine the optimal choice.

查看译文

关键词

Multiprocessor interconnection networks,Network topology,Computer simulation,Performance evaluation,High performance computing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要