Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers.

Parallel Computing(2017)

引用 21|浏览108
暂无评分
摘要
Generate parallel CUDA code from sequential C input code using a compiler-based tool for key operators in Geometric Multigrid.Demonstrate that generated code matched or improves performance of expert written CUDA code.Compare performance of generated code against a analytic performance bound derived from the Roofline model.Show weak scaling results of the generated code on the Titan supercomputer.Compares performance of Geometric Multigrid on the GPU based machines to CPU based machines. GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. As such, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and maintain two versions of their applications or frameworks. In this paper, we explore the use of a compiler-based autotuning framework based on CUDA-CHiLL to deliver not only portability, but also performance portability across CPU- and GPU-accelerated platforms for the geometric multigrid linear solvers found in many scientific applications. We show that with autotuning we can attain near Roofline (a performance bound for a computation and target architecture) performance across the key operations in the miniGMG benchmark for both CPU- and GPU-based architectures as well as for a multiple stencil discretizations and smoothers. We show that our technology is readily interoperable with MPI resulting in performance at scale equal to that obtained via hand-optimized MPI+CUDA implementation.
更多
查看译文
关键词
GPU,Compiler,Autotuning,Multigrid
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要