Compact Update Algorithm for Numerical Schemes with Cross Stencil for Data Access Locality.

Andrey V. Zakirov,Boris A. Korneev,Anastasia Y. Perepelkina

High Performance Computing and Cluster Technologies Conference (HPCCT)（2022）

引用 1|浏览2

暂无评分

摘要

Accurate fluid simulations require high computing cost. 3D modelling of fluid dynamic field evolution on a discrete mesh takes large amount of data storage, and data access becomes performance bottleneck. Our work is concerned with the task of mitigating the limitations that are caused by finite memory throughput in the parallel simulations. We use LRnLA algorithms for this issue, where localized tasks combine updates on several time layers. In this paper, the compact update for DiamondTorre LRnLA algorithm is constructed. It further improves localization of DiamondTorre algorithm, which improves arithmetic intensity for cross-stencil schemes. The ratio of loaded data to fully updated data approaches 1. The compact update is implemented with CUDA C++ for a numerical scheme for the advection-diffusion equation. 50 GLU/sec (billion lattice updates per second) performance is obtained on Nvidia RTX3090, and the maximal performance of almost 300 GLU/sec is obtained on an 8 GPU workstation. Note that the main data storage is in CPU RAM memory, but the host-device data exchange is concealed by temporal blocking: with appropriate the data transfers are concealed by the computing operations and do not affect the performance.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要