Towards Accelerating High-Order Stencils on Modern GPUs and Emerging Architectures with a Portable Framework

CoRR(2023)

引用 0|浏览5
暂无评分
摘要
PDE discretization schemes yielding stencil-like computing patterns are commonly used for seismic modeling, weather forecast, and other scientific applications. Achieving HPC-level stencil computations on one architecture is challenging, porting to other architectures without sacrificing performance requires significant effort, especially in this golden age of many distinctive architectures. To help developers achieve performance, portability, and productivity with stencil computations, we developed StencilPy. With StencilPy, developers write stencil computations in a high-level domain-specific language, which promotes productivity, while its backends generate efficient code for existing and emerging architectures, including NVIDIA, AMD, and Intel GPUs, A64FX, and STX. StencilPy demonstrates promising performance results on par with hand-written code, maintains cross-architectural performance portability, and enhances productivity. Its modular design enables easy configuration, customization, and extension. A 25-point star-shaped stencil written in StencilPy is one-quarter of the length of a hand-crafted CUDA code and achieves similar performance on an NVIDIA H100 GPU.
更多
查看译文
关键词
modern gpus,portable framework,high-order
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要