Modeling the Performance of Geometric Multigrid Stencils on Multicore Computer Architectures

SIAM JOURNAL ON SCIENTIFIC COMPUTING(2015)

引用 23|浏览55
暂无评分
摘要
The basic building blocks of the classic geometric multigrid algorithm all have a low ratio of executed floating point operations per byte fetched from memory. On modern computer architectures, such computational kernels are typically bound by memory traffic and achieve only a small percentage of the theoretical peak floating point performance of the underlying hardware. We suggest the use of state-of-the-art (stencil) compiler techniques to improve the flop per byte ratio, also called the arithmetic intensity, of the steps in the algorithm. Our focus will be on the smoother which is a repeated stencil application. With a tiling approach based on the polyhedral loop optimization framework, data reuse in the smoother can be improved, leading to a higher effective arithmetic intensity. For an academic constant coefficient Poisson problem, we present a performance model for the multigrid V -cycle solver based on the tiled smoother. For increasing numbers of smoothing steps, there is a trade-off between the improved efficiency due to better data reuse and the additional flops required for extra smoothing steps. Our performance model predicts time to solution by linking convergence rate to arithmetic intensity via the roofline model. We show results for two-dimensional (2D) and three-dimensional (3D) simulations on Intel Sandy Bridge and for 2D simulations on Intel Xeon Phi architectures. The actual performance is compared with the theoretical predictions.
更多
查看译文
关键词
multigrid,performance model,multicore,bandwidth
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要