A highly efficient I/O-based out-of-core stencil algorithm with globally optimized temporal blocking

2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)（2017）

引用 1|浏览17

暂无评分

摘要

This paper proposes the most efficient I/O-based out-of-core stencil algorithm for large-capacity type of non-volatile memory (NVM), such as flash. The paper evaluates the performances of various out-of-core stencil algorithms and implementations designed for flash. The algorithms for flash are very different from existing algorithms designed for memory-and-cache, host-and-GPU, and local-and-remote nodes, in their schemes, data structures used in stencil computations, and the way of using blocking technique to increase data access locality for accelerating performance. The proposed algorithm achieves 80% of the performance of in-core computing using sufficient capacity of the main memory, even if available memory capacity is limited to 6.3% of the data size required in the stencil computation problem. In other words, the algorithm degrades performance within 20% for the stencil computation problem that requires 2TiB of data by using only 128GiB of main memory and flash SSDs whose access latency is much larger than that of DRAM.

查看译文

关键词

Non-volatile memory,flash memory,temporal blocking,stencil,algorithm,out-of-core,asynchronous I/O,access locality,auto-tuning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要