Many-Core Acceleration Of A Discrete Ordinates Transport Mini-App At Extreme Scale

HIGH PERFORMANCE COMPUTING(2016)

引用 9|浏览28
暂无评分
摘要
Time-dependent deterministic discrete ordinates transport codes are an important class of application which provide significant challenges for large, many-core systems. One such challenge is the large memory capacity needed by the solve step, which requires us to have a scalable solution in order to have enough node-level memory to store all the data. In our previous work, we demonstrated the first implementation which showed a significant performance benefit for single node solves using GPUs. In this paper we extend our work to large problems and demonstrate the scalability of our solution on two Petascale GPU-based supercomputers: Titan at Oak Ridge and Piz Daint at CSCS. Our results show that our improved node-level parallelism scheme scales just as well across large systems as previous approaches when using the tried and tested KBA domain decomposition technique. We validate our results against an improved performance model which predicts the runtime of the main 'sweep' routine when running on different hardware, including CPUs or GPUs.
更多
查看译文
关键词
Memory Bandwidth, Transport Code, Scalar Flux, Chunk Size, Discrete Ordinate
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要